scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 1988"


Journal ArticleDOI
TL;DR: It is concluded that the four channels work in conjunction at threshold to create an operating range for the perception of vibration that extends from at least 0.4 to greater than 500 Hz and may be determined by the combined inputs from four channels.
Abstract: Although previous physiological and anatomical experiments have identified four afferent fiber types (PC, RA, SA II, and SA I) in glabrous (nonhairy) skin of the human somatosensory periphery, only three have been shown to mediate tactile (mechanoreceptive) sensation. Psychophysical evidence that four channels (P, NP I, NP II, and NP III) do, indeed, participate in the perceptual process is presented. In a series of experiments involving selective masking of the various channels, modification of the skin‐surface temperature, and testing cutaneous sensitivity down to very low‐vibratory frequencies, the fourth psychophysical channel (NP III) is defined. Based on these experiments and previous work from our laboratory, it is concluded that the four channels work in conjunction at threshold to create an operating range for the perception of vibration that extends from at least 0.4 to greater than 500 Hz. Each of the four channels appears to mediate specific portions of the overall threshold‐frequency characteristic. Selection of appropriate neural‐response criteria from previously published physiological data and correlation of their derived frequency characteristics with the four psychophysical channels indicates that each channel has its own physiological substrate: P channel and PC fibers, NP I channel and RA fibers, NP II channel and SA II fibers, and NP III channel and SA I fibers. These channels partially overlap in their absolute sensitivities, making it likely that suprathreshold stimuli may activate two or more of the channels at the same time. Thus the perceptual qualities of touch may be determined by the combined inputs from four channels.

885 citations


Journal ArticleDOI
TL;DR: In this article, the validity of low-order perturbation approximation for rough surface scattering is examined by comparison with exact results obtained by solving an integral equation and through comparison of low • order perturbations with higher • order predictions.
Abstract: The validity of the perturbation approximation for rough surface scattering is examined (1) by comparison with exact results obtained by solving an integral equation and (2) through comparison of low‐order perturbation predictions with higher‐order predictions. The pressure release boundary condition is assumed, and the field quantity calculated is the bistatic scattering cross section. A Gaussian roughness spectrum is used, and the surfaces have height variations in only one direction. It is found, in general, that the condition kh≪1 (k is the acoustic wavenumber, h is the rms surface height) is insufficient to guarantee the accuracy of first‐order (or higher‐order) perturbation theory. When the surface correlation length l becomes too large or too small with h held fixed, higher‐order perturbation terms can make larger contributions to the scattering cross section than lower‐order terms. An explanation for this result is given. The regions of validity for low‐order perturbation theory are also given. Th...

831 citations


Journal ArticleDOI
Ingo R. Titze1
TL;DR: It is shown that vocal tract inertance reduces the oscillation threshold pressure, whereas vocal tract resistance increases it, and the treatment is harmonized with former treatments based on two-mass models and collapsible tubes.
Abstract: A theory of vocal fold oscillation is developed on the basis of the body‐cover hypothesis. The cover is represented by a distributed surface layer that can propagate a mucosal surface wave. Linearization of the surface‐wave displacement and velocity, and further small‐amplitude approximations, yields closed‐form expressions for conditions of oscillation. The theory predicts that the lung pressure required to sustain oscillation, i.e., the oscillation threshold pressure, is reduced by reducing the mucosal wave velocity, by bringing the vocal folds closer together and by reducing the convergence angle in the glottis. The effect of vocal tract acoustic loading is included. It is shown that vocal tract inertance reduces the oscillation threshold pressure, whereas vocal tract resistance increases it. The treatment, which is applicable to falsetto and breathy voice, as well as onset or release of phonation in the absence of vocal fold collision, is harmonized with former treatments based on two‐mass models and ...

815 citations


Journal ArticleDOI
TL;DR: Recordings of productions of syllable sequences in soft, normal, and loud voice showed that with change from normal to loud voice, both males and females produced loud voice with increased pressure, accompanied by increased ac flow and increased maximum airflow declination rate.
Abstract: Measurements on the inverse filtered airflow waveform (the "glottal waveform") and of estimated average transglottal pressure and glottal airflow were made from noninvasive recordings of productions of syllable sequences in soft, normal, and loud voice for 25 male and 20 female speakers. Statistical analyses showed that with change from normal to loud voice, both males and females produced loud voice with increased pressure, accompanied by increased ac flow and increased maximum airflow declination rate. With change from normal voice, soft voice was produced with decreased pressure, ac flow and maximum airflow declination rate, and increased dc and average flow. Within the loudness conditions, there was no significant male-female difference in air pressure. Several glottal waveform parameters separated males and females in normal and loud voice. The data indicate higher ac flow and higher maximum airflow declination rate for males. In soft voice, the male and female glottal waveforms were more alike, and there was no significant difference in maximum airflow declination rate. The dc flow did not differ significantly between males and females. Possible relevance to biomechanical differences and differences in voice source characteristics between males and females and across loudness conditions is discussed.

616 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present an analytical approach to express any axisymmetric beam field in a simple analytical form, by superposition of Gaussian beams about the same axis but with beam waists of different sizes located at different positions along the axis.
Abstract: The diffraction field of a Gaussian planar velocity distribution is a Gaussian beam function under the condition (ka)2≫1. This property makes a series of Gaussian functions attractive as a possible base function set. The new approach presented enables one to express any axisymmetric beam field in a simple analytical form—the superposition of Gaussian beams about the same axis but with beam waists of different sizes located at different positions along the axis. A computer optimization is used to evaluate the coefficients, as well as the beam waists and their positions. The extreme case of a piston radiator is used to test the approach. Good agreement between a ten‐term Gaussian beam solution and the results of numerical integration (or analytical solution on axis) is obtained throughout the beam field: in the farfield, the transition region, and the nearfield. Discrepancies exist only in the extreme nearfield (<0.1 times the Fresnel distance). For surface velocity distributions that are less discontinuous...

588 citations


Journal ArticleDOI
TL;DR: The nature of the acoustic changes that taken place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load are discussed and the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers is discussed.
Abstract: Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short‐term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic–phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these acoustic differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that take place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load; (2) the role of training and feedback in controlling and modifying a talker’s speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.

511 citations


Journal ArticleDOI
TL;DR: The model constructed from the males' data correctly classified about 94% of the voiceless stops produced by the female speakers and the classification model held across gender.
Abstract: A statistical procedure for classifying word‐initial voiceless obstruents is described. The data set to which the analysis was applied consisted of monosyllabic words starting with a voiceless obstruent. Each word was repeated six times in the carrier phrase ‘‘I can say —— , again’’ by each of ten speakers. Fast Fourier transforms (FFTs), using a 20‐ms Hamming window, were calculated every 10 ms from the onset of the obstruent through the third cycle of the following vowel. Each FFT was treated as a random probability distribution from which the first four moments (mean, variance, skewness, and kurtosis) were computed. Moments were calculated from linear and Bark transformed spectra. Data were pooled across vowel contexts for speakers of a given gender and input to a discriminant analysis. Using the moments calculated from the linear spectra, 92% of the voiceless stops were classified correctly when dynamic aspects of the stop were included. Even more important, the model constructed from the males’ data correctly classified about 94% of the voiceless stops produced by the female speakers. Classification of the voiceless fricatives when all places of articulation were included in the analysis did not exceed 80% correct when the moments from either the linear or Bark transformed scales were used. However, classification of only the voiceless sibilants was 98% correct when the moments from the Bark transformed spectra were used. As with the stops, the classification model held across gender.

471 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the physics of matched field processing by modeling the ocean environment as a waveguide that is horizontally stratified with an arbitrary sound speed profile in the vertical.
Abstract: Matched field processing is a parameter estimation technique for localizing the range, depth, and bearing of a point source from the signal field propagating in an acoustic waveguide. The signal is observed at an array in the presence of additive, spatially correlated noise that also propagates in the same ocean environment as the signal. In a weak signal‐to‐noise situation this parameter estimation requires the maximum exploitation of the physics of both the signal and noise structure which then must be coupled to optimum methods for the signal processing. We study the physics of this processing by modeling the ocean environment as a waveguide that is horizontally stratified with an arbitrary sound‐speed profile in the vertical. Thus, the wave equation describes the underlying structure of the signal and noise, and the signal processing via the generation of the replica fields. Two methods of array processing are examined: (i) the linear cross correlator (Bartlett) and (ii) the maximum likelihood method ...

462 citations


PatentDOI
TL;DR: In this article, an in vivo imaging device is provided for producing real-time images of small, moving or stationary cavities and surrounding tissue structure, which includes a probe assembly of very small dimensions and preferably sufficiently small to fit within cavities having a diameter on the order of that of a human coronary artery.
Abstract: An in vivo imaging device is provided for producing real-time images of small, moving or stationary cavities and surrounding tissue structure. The imaging device includes a probe assembly of very small dimensions and preferably sufficiently small to fit within cavities having a diameter on the order of that of a human coronary artery. The probe assembly may be mounted to a positioning device such as a catheter, which allows for the use of, for example, conventional guiding catheters and guide wires.

403 citations


Journal ArticleDOI
TL;DR: In this paper, a new formulation of the dynamics of bubble oscillations is presented in which the internal pressure is obtained numerically and the polytropic approximation is no longer required.
Abstract: The standard approach to the analysis of the pulsations of a driven gas bubble is to assume that the pressure within the bubble follows a polytropic relation of the form p=p0(R0/R)3κ, where p is the pressure within the bubble, R is the radius, κ is the polytropic exponent, and the subscript zero indicates equilibrium values. For nonlinear oscillations of the gas bubble, however, this approximation has several limitations and needs to be reconsidered. A new formulation of the dynamics of bubble oscillations is presented in which the internal pressure is obtained numerically and the polytropic approximation is no longer required. Several comparisons are given of the two formulations, which describe in some detail the limitations of the polytropic approximation.

403 citations


PatentDOI
TL;DR: In this paper, an endoscope with an ultrasonic diagnosis function is described, where the probe is arranged in the front end region of the portion to be inserted into a coelom or the like cavity, and is connected to an external rotational drive portion by a hollow multi-layered power transmission means to provide an observation field extending through the entire angular range.
Abstract: An ultrasonic diagnosis device disclosed includes an endoscope with an ultrasonic diagnosis function, wherein an ultrasonic probe is arranged in the front end region of the portion to be inserted into a coelom or the like cavity, and is connected to an external rotational drive portion by a hollow multi-layered power transmission means to provide an observation field extending through the entire angular range. Signal wires from the probe and passed between outer and inner layers of the power transmission means which thus has a hollow center channel. Preferably, an observation optical system is arranged within the hollow channel to provide a visual field on the front side in the insertion direction, for a continuous confirmation purpose, permitting the insertion portion to be safely inserted even into a narrow cavity of an organ with a complex undulation.

Journal ArticleDOI
TL;DR: Percent recognition of phonemes and whole syllables, measured in both consonant-vowel-consonant (CVC) words and CVC nonsense syllables is reported for normal young adults listening at four signal-to-noise (S/N) ratios, demonstrating an increasing tendency to perceive HP sentences either as wholes, or not at all, as S/N ratio deteriorates.
Abstract: Percent recognition of phonemes and whole syllables, measured in both consonant-vowel-consonant (CVC) words and CVC nonsense syllables, is reported for normal young adults listening at four signal-to-noise (S/N) ratios. Similar data are reported for the recognition of words and whole sentences in three types of sentence: high predictability (HP) sentences, with both semantic and syntactic constraints; low predictability (LP) sentences, with primarily syntactic constraints; and zero predictability (ZP) sentences, with neither semantic nor syntactic constraints. The probability of recognition of speech units in context (pc) is shown to be related to the probability of recognition without context (pi) by the equation pc = 1 - (1-pi)k, where k is a constant. The factor k is interpreted as the amount by which the channels of statistically independent information are effectively multiplied when contextual constraints are added. Empirical values of k are approximately 1.3 and 2.7 for word and sentence context, respectively. In a second analysis, the probability of recognition of wholes (pw) is shown to be related to the probability of recognition of the constituent parts (pp) by the equation pw = pjp, where j represents the effective number of statistically independent parts within a whole. The empirically determined mean values of j for nonsense materials are not significantly different from the number of parts in a whole, as predicted by the underlying theory. In CVC words, the value of j is constant at approximately 2.5. In the four-word HP sentences, it falls from approximately 2.5 to approximately 1.6 as the inherent recognition probability for words falls from 100% to 0%, demonstrating an increasing tendency to perceive HP sentences either as wholes, or not at all, as S/N ratio deteriorates.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the acoustic transients emitted after breakdown and cavitation bubble collapse upon focusing a Q-switch laser pulse into a liquid with special emphasis on their modifications induced by a solid boundary.
Abstract: The acoustic transients emitted after breakdown and cavitation bubble collapse upon focusing a Q‐switch laser pulse into a liquid are investigated with special emphasis on their modifications induced by a solid boundary. For measuring the form p(t)/pmax of the pressure pulses an optical technique with a resolution of 10 ns has been developed. When p(t)/pmax is known, the pressure amplitude can be determined even when a transducer with a rise time much longer than the pulse duration is used. The duration of the transients (20–30 ns) and their pressure are nearly the same after breakdown and spherical bubble collapse. During spherical collapse, a maximum pressure of about 60 kbar is developed inside a bubble with Rmax=3.5 mm, and on average 73% of the bubble energy loss is transformed into acoustic energy. The sound emission near a solid boundary strongly depends on the normalized distance γ between the bubble and the boundary. The highest pressures at the boundary are achieved for γ→0; for γ=0.2 and Rmax =3.5 mm it has been found that p=2.5 kbar. These results are discussed with respect to the mechanisms of cavitation erosion important for hydraulic cavitation, laser lithotripsy, and ocular surgery.

Journal ArticleDOI
TL;DR: In this paper, two types of analyses have been performed on the measured durations of recordings produced by six talkers reading two scripts of approximately 300 words each, using the combined visual-auditory marking technique, and preliminary results were reported earlier by Crystal and House [J. Acoust. Soc. Am. 72, 705-716 (1982).
Abstract: Two types of analyses have been performed on the measured durations of recordings produced by six talkers reading two scripts of approximately 300 words each. The texts, the combined visual–auditory marking technique, and preliminary results were reported earlier by Crystal and House [J. Acoust. Soc. Am. 72, 705–716 (1982)]. The average durations and standard deviations of various classes of speech sounds, as well as individual speech sounds, have been determined and segmental measurements are compared to earlier data and to various pertinent published reports. The histograms of the measured durations of various sounds and categories have been fitted with distributions which are, equivalently, the exit‐probability sequence for a Markov chain or the impulse response of an IIR digital‐filter network.

Journal ArticleDOI
TL;DR: It is argued that the favorable performance of the subharmonic-summation algorithm stems from its corresponding more closely with current pitch-perception theories than does the harmonic sieve.
Abstract: In order to account for the phenomenon of virtual pitch, various theories assume implicitly or explicitly that each spectral component introduces a series of subharmonics. The spectral-compression method for pitch determination can be viewed as a direct implementation of this principle. The widespread application of this principle in pitch determination is, however, impeded by numerical problems with respect to accuracy and computational efficiency. A modified algorithm is described that solves these problems. Its performance is tested for normal speech and "telephone" speech, i.e., speech high-pass filtered at 300 Hz. The algorithm out-performs the harmonic-sieve method for pitch determination, while its computational requirements are about the same. The algorithm is described in terms of nonlinear system theory, i.c., subharmonic summation. It is argued that the favorable performance of the subharmonic-summation algorithm stems from its corresponding more closely with current pitch-perception theories than does the harmonic sieve.

Journal ArticleDOI
TL;DR: The results for the "child learners" suggest that a sensitive period for speech learning is reached long before the age of 12 years, as commonly supposed, and that amount of unaided second-language (L2) experience does not affect adults' L2 pronunciation beyond an initial rapid stage of learning.
Abstract: This study used interval scaling to assess degree of perceived foreign accent in English sentences spoken by native and non‐native talkers Native English listeners gave significantly higher (ie, more authentic) pronunciation scores to native speakers of English than to Chinese adults who began learning English at an average age of 76 years The results for the ‘‘child learners’’ suggest that a sensitive period for speech learning is reached long before the age of 12 years, as commonly supposed Adults who had lived in the US for 5 years did not receive higher scores than those who had lived there for only 1 year, suggesting that amount of unaided second‐language (L2) experience does not affect adults’ L2 pronunciation beyond an initial rapid stage of learning Native speakers of Chinese who rated the sentences for foreign accent showed the same pattern of between‐group differences as the native English listeners The more experienced of two groups of Chinese listeners differentiated native and non‐native talkers to a significantly greater extent than a less experienced group, even though the subjects in both groups spoke English with equally strong foreign accents This suggests that tacit knowledge of how L2 sentences ‘‘ought’’ to sound increases more rapidly than the ability to produce those sentences

Journal ArticleDOI
TL;DR: A review of the relationship between noise exposure and the subjective reactions to it was conducted by as mentioned in this paper, which indicated that remarkably similar results have been obtained across different nationalities with different measurement techniques.
Abstract: Social surveys of the relationship between noise exposure and the subjective reactions to it were reviewed This review indicated that remarkably similar results have been obtained across different nationalities with different measurement techniques Only a small percentage (typically less than 20%) of the variation in individual reaction is accounted for by noise exposure Analysis of potential errors in both measurement of noise and reaction suggests that elimination of errors would only slightly increase the observed correlations Variables, such as attitude to the noise source and sensitivity to noise, account for more variation in reaction than does noise exposure The weaker relationship between noise exposure and attitude than between reaction and attitude suggests that the attitude/reaction relationship is not entirely due to noise exposure causing a change in attitude itself Noise/reaction correlations based on individual data are significantly lower in studies of impulsive noise than nonimpulsive noise This may be caused, in part, by the restricted range of noise exposure studied in some socioacoustic investigations of impulsive noise However, the significantly higher correlations of attitude and reaction in impulsive noise studies suggest that attitude plays an even larger part, while noise exposure plays a lesser part in determining reaction to impulsive noise, relative to nonimpulsive noise

PatentDOI
TL;DR: In this article, an apparatus and method is disclosed for imaging internal features of a living body at a preselected site so that the data can be imaged in a quick, efficient and reliable manner with high resolution.
Abstract: An apparatus and method is disclosed for imaging internal features of a living body at a preselected site so that the data can be imaged in a quick, efficient and reliable manner with high resolution. The apparatus includes a catheter (20) having a longitudinal axis (22), a proximal end (23) and a distal end (24) such that the catheter is adapted to be partially inserted into said living body so that said distal end is positioned relative to the preselected site and imaging data relating to the internal features can be acoustically provided at said distal end by moving said distal end through a plurality of positions relative to the site and generating an acoustic signal when the distal end is at each of said positions. The acoustic energy responsive to each acoustic signal at each of the positions is sensed so as to create a set of data. The location, including the orientation of said distal end of said catheter is sensed at each of said positions. The sets of data and the respective positions from which each was obtained is related to one another so as to create an image of the internal features of the body.

Journal ArticleDOI
TL;DR: It is suggested that Heschl's gyri and surrounding cortex in the right cerebral hemisphere play a crucial role in extracting the pitch corresponding to the fundamental from a complex tone.
Abstract: Sixty‐four patients with unilateral temporal‐lobe excisions as well as 18 normal control subjects were tested in a missing fundamental pitch perception task. Subjects were required to indicate if the pitch of a pair of tones rose or fell. The excisions encroached upon Heschl’s gyri in some cases, whereas, in others, this region was spared. All subjects included for study were able to perform well on a control task in which complex tones including a fundamental were presented. Stimuli for the experimental task, which was procedurally identical with the control task, consisted of several harmonic components spanning the same spectral range, but without a fundamental. Only subjects with right temporal lobectomy in whom Heschl’s gyri were excised committed significantly more errors than the normal control group on this task. Patients with left temporal‐lobe lesions or with anterior right temporal‐lobe excisions were unimpaired. These results suggest that Heschl’s gyri and surrounding cortex in the right cerebral hemisphere play a crucial role in extracting the pitch corresponding to the fundamental from a complex tone.

Journal ArticleDOI
TL;DR: The intense acoustic wave generated at the focus of an extracorporeal shock wave lithotripter is modeled as the impulse response of a parallel RLC circuit, and the zero-order effect of gas diffusion on bubble response is included.
Abstract: The intense acoustic wave generated at the focus of an extracorporeal shock wave lithotripter is modeled as the impulse response of a parallel RLC circuit The shock wave consists of a zero rise time positive spike that falls to 0 at 1 μs followed by a negative pressure component 6 μs long with amplitudes scaled to +1000 and −160 bars, P+ and P−, respectively This pressure wave drives the Gilmore–Akulichev formulation for bubble dynamics; the zero‐order effect of gas diffusion on bubble response is included The negative pressure component of a 1000‐bar shock wave will cause a preexisting bubble in the 1‐ to 10‐μm range to expand to over 100 times its initial size, R0, for 250 μs, with a peak radius of ∼1400 μm, then collapse very violently, emitting far UV or soft x‐ray photons (black body) Gas diffusion does not appreciably mitigate the amplitude of the pressure wave radiated at the primary collapse, but does significantly reduce the collapse temperature Diffusion also increases the bubble radius fro

PatentDOI
TL;DR: A method for creating word models for a large vocabulary, natural language dictation system that may be used for connected speech as well as for discrete utterances.
Abstract: A method for creating word models for a large vocabulary, natural language dictation system. A user with limited typing skills can create documents with little or no advance training of word models. As the user is dictating, the user speaks a word which may or may not already be in the active vocabulary. The system displays a list of the words in the active vocabulary which best match the spoken word. By keyboard or voice command, the user may choose the correct word from the list or may choose to edit a similar word if the correct word is not on the list. Alternately, the user may type or speak the initial letters of the word. Then the recognition algorithm is called again satisfying the initial letters, and the choices displayed again. A word list is then also displayed from a large backup vocabulary. The best words to display from the backup vocabulary are chosen using a statistical language model and optionally word models derived from a phonemic dictionary. When the correct word is chosen by the user, the speech sample is used to create or update an acoustic model for the word, without further intervention by the user. As the system is used, it also constantly updates its statistical language model. The system gets more and more word models and keeps improving its performance the more it is used. The system may be used for connected speech as well as for discrete utterances.

PatentDOI
TL;DR: In this paper, a voice recognition system used in a telephone apparatus comprises speech input means, speech recognizing means (350a), telephone number memory means (388) for storing telephone numbers corresponding to speech inputs, and calling means for reading out a telephone number corresponding to a speech input recognized by the speech recognizer and making a call corresponding to the readout telephone number.
Abstract: A voice recognition system used in a telephone apparatus comprises speech input means, speech recognizing means (350a) for recognizing a speech input at the speech input means, telephone number memory means (388) for storing telephone numbers corresponding to speech inputs, and calling means for reading out a telephone number corresponding to a speech input recognized by the speech recognizing means and making a call corresponding to the readout telephone number, wherein contents of the telephone number memory means can be entirely erased. The telephone numbers of the third parties subjected to voice dialing can be effectively registered within a limited memory capacity.

Journal ArticleDOI
TL;DR: The results of these studies demonstrate that backpropagation learning can be used with complex, natural data to identify a feature structure that can serve as the basis for both analysis and nontrivial pattern recognition.
Abstract: In the work described here, the backpropagation neural network learning procedure is applied to the analysis and recognition of speech. This procedure takes a set of input/output pattern pairs and attempts to learn their functional relationship; it develops the necessary representational features during the course of learning. A series of computer simulation studies was carried out to assess the ability of these networks to accurately label sounds, to learn to recognize sounds without labels, and to learn feature representations of continuous speech. These studies demonstrated that the networks can learn to label presegmented test tokens with accuracies of up to 95%. Networks trained on segmented sounds using a strategy that requires no external labels were able to recognize and delineate sounds in continuous speech. These networks developed rich internal representations that included units which corresponded to such traditional distinctions as vowels and consonants, as well as units that were sensitive to novel and nonstandard features. Networks trained on a large corpus of unsegmented, continuous speech without labels also developed interesting feature representations, which may be useful in both segmentation and label learning. The results of these studies, while preliminary, demonstrate that backpropagation learning can be used with complex, natural data to identify a feature structure that can serve as the basis for both analysis and nontrivial pattern recognition.

Journal ArticleDOI
TL;DR: In this paper, the influence of errors on two-microphone measurements in ducts without flow has been investigated and the conclusions from the earlier work have been extended to the case with flow.
Abstract: In an earlier work [H. Boden and M. Abom, J. Acoust. Soc. Am. 79, 541–549 (1986)] the influence of errors on two‐microphone measurements in ducts without flow has been studied. The aim of this article is mainly to extend the earlier work to include the effects of mean flow and also of attenuation during the sound propagation. First, a short review of the various existing two‐microphone methods is made. The errors in the measured input data are then analyzed and special attention is paid to the effects of neglected attenuation, nonideal microphones, and flow noise. The influence of errors on the calculated quantities has been investigated and the conclusions from the earlier work have been extended to the case with flow. It is also shown that the neglect of attenuation between the microphones leads to a low‐frequency limit for the applicability of the two‐microphone method. Finally, a new technique for measuring the Mach number using a two‐microphone method is suggested.

Journal ArticleDOI
TL;DR: A computational model of mechanical to neural transduction at the hair cell-auditory-nerve synapse that produces a stream of events that are precisely located in time in response to an arbitrary stimulus is presented, which is computationally convenient and well suited to use in automatic recognition devices that use models of the peripheral auditory system as input devices.
Abstract: A computational model of mechanical to neural transduction at the hair cell–auditory‐nerve synapse is presented. It produces a stream of events (spikes) that are precisely located in time in response to an arbitrary stimulus and is intended for use as an input to automatic speech recognition systems as well as a contribution to the theory of the origin of auditory‐nerve spike activity. The behavior of the model is compared to data from animal studies in the following tests: (a) rate‐intensity functions for adapted and unadapted responding; (b) two‐component short‐term adaptation; (c) frequency‐limited phase locking of events; (d) additivity of responding following stimulus‐intensity increases and decreases; (e) recovery of spontaneous activity following stimulus offset; and (f) recovery of ability to respond to a second stimulus following offset of a first stimulus. The behavior of the model compares well with empirical data but discrepancies in tests (d) and (f) point to the need for further development. Additional functions that have been successfully simulated in previous tests include realistic interspike‐interval histograms for silence and intense sinusoidal stimuli, realistic poststimulus period histograms at various intensities and nonmonotonic functions relating incremental and decremental responses to background stimulus intensity. The model is computationally convenient and well suited to use in automatic recognition devices that use models of the peripheral auditory system as input devices. It is particularly well suited to devices that require stimulus phase information to be preserved at low frequencies.

Journal ArticleDOI
TL;DR: The temporal-window model successfully accounts for the data from a variety of experiments measuring temporal resolution, however, it fails to predict certain aspects of forward masking and of the detection of amplitude modulation at high rates.
Abstract: This article examines the idea that the temporal resolution of the auditory system can be modeled using a temporal window (an intensity weighting function) analogous to the auditory filter measured in the frequency domain. To estimate the shape of the hypothetical temporal window, threshold was measured for a brief sinusoidal signal presented in a temporal gap between two bursts of noise. The duration of the gap was systematically varied and the signal was placed both symmetrically and asymmetrically within the gap. The data were analyzed by assuming that the temporal window had the form of a simple mathematical expression with a small number of free parameters. The values of the parameters were adjusted to give the best fit to the data. The analysis assumed that, for each condition, the temporal window was centered at the time giving the highest signal-to-masker ratio, and that threshold corresponded to a fixed ratio of signal energy to masker energy at the output of the window. The data were fitted well by modeling each side of the window as the sum of two rounded-exponential functions. The window was highly asymmetric, having a shallower slope for times before the center than for times after. The equivalent rectangular duration (ERD) of the window was typically about 8 ms. The ERD increased slightly when the masker level was decreased, but did not differ significantly for signal frequencies of 500 and 2000 Hz. The temporal-window model successfully accounts for the data from a variety of experiments measuring temporal resolution. However, it fails to predict certain aspects of forward masking and of the detection of amplitude modulation at high rates.

PatentDOI
TL;DR: In this paper, a method and apparatus for localizing an object in space, such as a gallstone in a human, using stereo imaging and ultrasound imaging is described in connection with a dry table shock wave lithotripter, and uses a conventional ultrasound imaging system but wherein the ultrasound transducer has been modified to enable its location in space to be readily determined automatically.
Abstract: The present disclosure relates to a method and apparatus for localizing an object in space, such as a gallstone in a human, using stereo imaging and ultrasound imaging. The localization system is described in connection with a dry table shock wave lithotripter, and uses a conventional ultrasound imaging system but wherein the ultrasound transducer has been modified to enable its location in space to be readily determined automatically. This is accomplished by providing a hood fixed to this transducer, the hood having a plurality of reference points which may be in the form of light sources such as LEDs. This hood is imaged by head and foot video cameras. By calibrating the cameras with respect to a reference point, calibrating the focal point of the shock wave system with respect to the reference point, and knowing the relationship of the hood to the ultrasound transducer, the position of the stone with respect to the reference point can be determined by ultrasound imaging of the stone, along with suitable storage of the ultrasound and camera images, digitizing and processing of these images. Given this information, the patient with the stone can be suitably moved so as to position the stone at the focal point of the shock wave generating system.

PatentDOI
TL;DR: In this article, an ultrasonic device for applying cavitation force to an unwanted material is provided, particularly useful for removing plaque from a human artery wherein a portion of the device can enter an artery and pass through the artery to the vicinity of the plaque.
Abstract: An ultrasonic device for applying cavitation force to an unwanted material is provided. The device is particularly useful for removing plaque from a human artery wherein a portion of the device can enter an artery and pass through the artery to the vicinity of the plaque. The device includes a solid wire of titanium material, a transducer, a generator for providing vibration energy via the transducer, and a handpiece enclosing the transducer and having a tapered end portion with an exponential surface of slightly concave profile. The tapered end portion is fixedly connected to an inner end of the wire. For human applications, the wire has an overall length in the range of about 5 inches to 40 inches and has a uniform outer diameter in the range of about from 0.015 inches to 0.040 inches. The device includes a catheter assembly having a catheter tube enclosing the wire and includes a container unit mounted on the tapered end portion of the handpiece. The container unit has a fluid chamber for receiving a contrast material which passes through the chamber, through the catheter tube, to the outer end tip portions of the wire and tube in the area of the plaque to be removed. The container unit also has a coupling unit forming an outer end wall thereof having a fixed portion and a rotatable knob portion for moving the catheter tube axially relative to the wire for adjusting the extension of the outer end tip portion of the wire beyond the outer end tip portion of the catheter tube.

Journal ArticleDOI
TL;DR: In this article, the results of acoustical and optical experiments in which "moderately"" underexpanded sonic round jets impinge on flat plates normal to the jet axis are presented and analyzed.
Abstract: The results of acoustical and optical experiments in which ‘‘moderately’’ underexpanded sonic round jets impinge on flat plates normal to the jet axis are presented and analyzed. Periodic unstable oscillations of the jet flow, with the resultant radiation of sound of discrete frequencies, occur over a wide variation of control parameters, namely, pressure ratio, plate size, and spacing of the plate from the jet nozzle. For ‘‘small’’ plates, the principal oscillations with λ/D about 4 (λ=acoustic wavelength, D=nozzle diameter) occur when the standoff shock wave lies in a pressure recovery region of the periodic cellular structure of the choked jet and is, therefore, highly unstable; then the oscillations have key characteristics in common with the high‐harmonic excitation of Hartmann’s acoustic air‐jet generator. An analogous feedback mechanism in the standoff zone is suggested in which pressure waves reflected from the plate trigger the motion of the unstable shock wave. For ‘‘large’’ plates, acoustic fee...

PatentDOI
TL;DR: In this paper, the spectral energy analysis is carried out using pairs of high pass and low pass digital filters in cascade relation, with the output of each low pass filter being provided to the next pair of high-pass and low-pass filters.
Abstract: A hearing aid system utilizes digital signal processing to correct for the hearing deficit of a particular user and to maximize the intelligibility of the desired audio signal relative to noise. An analog signal from a microphone is converted to digital data which is operated on by a digital signal processor, with the output of the digital signal processor being converted back to an analog signal which is amplified and provided to the user. The digital signal processor includes a time varying spectral filter having filter coefficients which can be varied on a quasi-real time basis to spectrally shape the signal to match the hearing deficit of the user and to accommodate ambient signal and noise levels. The coefficients of the spectral filter are determined by estimating the energy in several frequency bands within the frequency range of the input signal, and using those energy estimates to calculate desired gains for the frequency bands and corresponding spectral filter coefficients. The spectral energy analysis may be carried out using pairs of high pass and low pass digital filters in cascade relation, with the output of each low pass filter being provided to the next pair of high pass and low pass filters. The rate at which output data is provided from the filters in each pair may be reduced from the sample rate of input data by one half for succeeding pairs of filters in the cascade to thereby reduce the computation time required.