scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 2000"


Journal ArticleDOI
TL;DR: The present results indicate that spectral peak location, spectral moments, and both normalized and relative amplitude serve to distinguish all four places of fricative articulation.
Abstract: This study constitutes a large-scale comparative analysis of acoustic cues for classification of place of articulation in fricatives. To date, no single metric has been found to classify fricative place of articulation with a high degree of accuracy. This study presents spectral, amplitudinal, and temporal measurements that involve both static properties (spectral peak location, spectral moments, noise duration, normalized amplitude, and F2 onset frequency) and dynamic properties (relative amplitude and locus equations). While all cues (except locus equations) consistently serve to distinguish sibilant from nonsibilant fricatives, the present results indicate that spectral peak location, spectral moments, and both normalized and relative amplitude serve to distinguish all four places of fricative articulation. These findings suggest that these static and dynamic acoustic properties can provide robust and unique information about all four places of articulation, despite variation in speaker, vowel context, and voicing.

716 citations


PatentDOI
TL;DR: In this paper, a method of manufacturing high-power sandwich type ultrasonic transducers without the need for a trimming process is described, and a new method of tuning high power sandwich type transducers is presented, where a dimension or material property of a selected tuning element alters the measured resonant frequency of the transducers.
Abstract: This invention is a method of manufacturing high-power sandwich type ultrasonic transducers (82), and more particularly, a new method of tuning high-power sandwich type ultrasonic transducers (86) without the need for a trimming process. A method in accordance with the present invention includes the steps of assembling a sandwich type ultrasonic transducer (82), measuring the resonant frequency of the ultrasonic transducer (82), and selecting from a plurality of tuning elements (50), whereby a dimension or material property of a selected tuning element alters the measured resonant frequency of the ultrasonic transducer (82) to a desired resonant frequency after the tuning element (50) is attached to the ultrasonic transducer (82).

693 citations


PatentDOI
TL;DR: In this article, a control system alters one or more characteristics of an ablating element to ablate tissue, such as focal length, frequency, power, and frequency of ultrasound which is focused in at least one dimension.
Abstract: A control system alters one or more characteristics of an ablating element to ablate tissue. In one aspect, the control system delivers energy nearer to the surface of the tissue by changing the frequency or power. In another aspect, the ablating element delivers focused ultrasound which is focused in at least one dimension. The ablating device may also have a number of ablating elements with different characteristics such as focal length.

512 citations


Journal ArticleDOI
TL;DR: The data suggest that visual cues derived from the dynamic movements of the fact during speech production interact with time-aligned auditory cues to enhance sensitivity in auditory detection.
Abstract: Classic accounts of the benefits of speechreading to speech recognition treat auditory and visual channels as independent sources of information that are integrated fairly early in the speech perception process. The primary question addressed in this study was whether visible movements of the speech articulators could be used to improve the detection of speech in noise, thus demonstrating an influence of speechreading on the ability to detect, rather than recognize, speech. In the first experiment, ten normal-hearing subjects detected the presence of three known spoken sentences in noise under three conditions: auditory-only (A), auditory plus speechreading with a visually matched sentence (AV(M)) and auditory plus speechreading with a visually unmatched sentence (AV(UM). When the speechread sentence matched the target sentence, average detection thresholds improved by about 1.6 dB relative to the auditory condition. However, the amount of threshold reduction varied significantly for the three target sentences (from 0.8 to 2.2 dB). There was no difference in detection thresholds between the AV(UM) condition and the A condition. In a second experiment, the effects of visually matched orthographic stimuli on detection thresholds was examined for the same three target sentences in six subjects who participated in the earlier experiment. When the orthographic stimuli were presented just prior to each trial, average detection thresholds improved by about 0.5 dB relative to the A condition. However, unlike the AV(M) condition, the detection improvement due to orthography was not dependent on the target sentence. Analyses of correlations between area of mouth opening and acoustic envelopes derived from selected spectral regions of each sentence (corresponding to the wide-band speech, and first, second, and third formant regions) suggested that AV(M) threshold reduction may be determined by the degree of auditory-visual temporal coherence, especially between the area of lip opening and the envelope derived from mid- to high-frequency acoustic energy. Taken together, the data (for these sentences at least) suggest that visual cues derived from the dynamic movements of the fact during speech production interact with time-aligned auditory cues to enhance sensitivity in auditory detection. The amount of visual influence depends in part on the degree of correlation between acoustic envelopes and visible movement of the articulators.

503 citations


Journal ArticleDOI
TL;DR: A new type of thermoacoustic engine based on traveling waves and ideally reversible heat transfer is described and data are presented which show the nearly complete elimination of the streaming convective heat loads.
Abstract: A new type of thermoacoustic engine based on traveling waves and ideally reversible heat transfer is described. Measurements and analysis of its performance are presented. This new engine outperforms previous thermoacoustic engines, which are based on standing waves and intrinsically irreversible heat transfer, by more than 50%. At its most efficient operating point, it delivers 710 W of acoustic power to its resonator with a thermal efficiency of 0.30, corresponding to 41% of the Carnot efficiency. At its most powerful operating point, it delivers 890 W to its resonator with a thermal efficiency of 0.22. The efficiency of this engine can be degraded by two types of acoustic streaming. These are suppressed by appropriate tapering of crucial surfaces in the engine and by using additional nonlinearity to induce an opposing time-averaged pressure difference. Data are presented which show the nearly complete elimination of the streaming convective heat loads. Analysis of these and other irreversibilities show which components of the engine require further research to achieve higher efficiency. Additionally, these data show that the dynamics and acoustic power flows are well understood, but the details of the streaming suppression and associated heat convection are only qualitatively understood.

494 citations


Journal ArticleDOI
TL;DR: A database of speech samples from eight different talkers has been collected for use in multitalker communications research and the nature of the corpus, the data collection methodology, and the means for obtaining copies of the database are presented.
Abstract: A database of speech samples from eight different talkers has been collected for use in multitalker communications research. Descriptions of the nature of the corpus, the data collection methodology, and the means for obtaining copies of the database are presented.

488 citations


Journal ArticleDOI
TL;DR: It is concluded that the shell strongly alters the acoustic behavior of the bubbles: the stiffness and viscosity of the particles are mainly determined by the encapsulating shell, not by the air inside.
Abstract: A model for the oscillation of gas bubbles encapsulated in a thin shell has been developed. The model depends on viscous and elastic properties of the shell, described by thickness, shear modulus, and shear viscosity. This theory was used to describe an experimental ultrasound contrast agent from Nycomed, composed of air bubbles encapsulated in a polymer shell. Theoretical calculations were compared with measurements of acoustic attenuation at amplitudes where bubble oscillations are linear. A good fit between measured and calculated results was obtained. The results were used to estimate the viscoelastic properties of the shell material. The shell shear modulus was estimated to between 10.6 and 12.9 MPa, the shell viscosity was estimated to between 0.39 and 0.49 Pas. The shell thickness was 5% of the particle radius. These results imply that the particles are around 20 times more rigid than free air bubbles, and that the oscillations are heavily damped, corresponding to Q-values around 1. We conclude that the shell strongly alters the acoustic behavior of the bubbles: The stiffness and viscosity of the particles are mainly determined by the encapsulating shell, not by the air inside.

422 citations


Journal ArticleDOI
Jacob Benesty1
TL;DR: A new approach is proposed that is based on eigenvalue decomposition that performs well and is very accurate for time delay estimation of acoustic source locations.
Abstract: To find the position of an acoustic source in a room, the relative delay between two (or more) microphone signals for the direct sound must be determined. The generalized cross-correlation method is the most popular technique to do so and is well explained in a landmark paper by Knapp and Carter. In this paper, a new approach is proposed that is based on eigenvalue decomposition. Indeed, the eigenvector corresponding to the minimum eigenvalue of the covariance matrix of the microphone signals contains the impulse responses between the source and the microphone signals (and therefore all the information we need for time delay estimation). In experiments, the proposed algorithm performs well and is very accurate.

395 citations


PatentDOI
Robert S. Cooper1, Jeff F. McElroy1, Walter Rolandi1, Derek Sanders1, Richard M. Ulmer1 
TL;DR: In this article, a computer-based method for performing a command via a voice user interface on a subset of objects is presented, which consists of text which is converted to voice output.
Abstract: A computer based method for performing a command via a voice user interface on a subset of objects. The subset is selected from a set of objects, each having an object type. At least one taggable field is associated with the object type and has a corresponding value. The set of objects is stored in the computer memory. An utterance is received from the user and includes a command, an object type selection, a taggable field selection, and a value for the taggable field. Responsive to the utterance, at least one object is retrieved from the set of objects, the object of the type selected by the user and having a value in the taggable field selection that matches the taggable field value received from the user. The command is performed on the object. The object consists of text which is converted to voice output.

394 citations


PatentDOI
TL;DR: In this article, an electronically phased array is used for controlling the focal point of an ultrasound beam and the ultrasound beam produced by the transducer elements can also be electronically steered.
Abstract: Ultrasound applicators able to both image a treatment site and administer ultrasound therapy include an array of transducer elements that can be focused. In several embodiments, an electronically phased array is used for controlling the focal point of an ultrasound beam. The ultrasound beam produced thereby can also be electronically steered. To reduce the quality factor or Q of the array when the array is used for imaging, an electronic switch is selectively closed, placing a resistance in parallel with each of the array elements. A flexible array is employed in several embodiments and is selectively bent or flexed to vary its radius of curvature and thus control the focal point and/or a direction of focus of the array. In another embodiment, each of the transducer elements comprising the array are individually mechanically pivotable to steer the ultrasonic beam produced by the transducer elements.

382 citations


Journal ArticleDOI
TL;DR: The Handbook of Neural Network Signal Processing brings together applications that were previously scattered among various publications to provide an up-to-date, detailed treatment of the subject from an engineering point of view.
Abstract: From the Publisher: The use of neural networks is permeating every area of signal processing. They can provide powerful means for solving many problems, especially in nonlinear, real-time, adaptive, and blind signal processing. The Handbook of Neural Network Signal Processing brings together applications that were previously scattered among various publications to provide an up-to-date, detailed treatment of the subject from an engineering point of view.The authors cover basic principles, modeling, algorithms, architectures, implementation procedures, and well-designed simulation examples of audio, video, speech, communication, geophysical, sonar, radar, medical, and many other signals. The subject of neural networks and their application to signal processing is constantly improving. You need a handy reference that will inform you of current applications in this new area. The Handbook of Neural Network Signal Processing provides this much needed service for all engineers and scientists in the field.

Journal ArticleDOI
TL;DR: Evidence is given that the Tilt model goes a long way to satisfying the desired goals of such a representation in that it has the right number of degrees of freedom to be able to describe and synthesize intonation accurately.
Abstract: This paper introduces the Tilt intonational model and describes how this model can be used to automatically analyze and synthesize intonation. In the model, intonation is represented as a linear sequence of events, which can be pitch accents or boundary tones. Each event is characterized by continuous parameters representing amplitude, duration, and tilt (a measure of the shape of the event). The paper describes an event detector, in effect an intonational recognition system, which produces a transcription of an utterance's intonation. The features and parameters of the event detector are discussed and performance figures are shown on a variety of read and spontaneous speaker independent conversational speech databases. Given the event locations, algorithms are described which produce an automatic analysis of each event in terms of the Tilt parameters. Synthesis algorithms are also presented which generate F0 contours from Tilt representations. The accuracy of these is shown by comparing synthetic F0 contours to real F0 contours. The paper concludes with an extensive discussion on linguistic representations of intonation and gives evidence that the Tilt model goes a long way to satisfying the desired goals of such a representation in that it has the right number of degrees of freedom to be able to describe and synthesize intonation accurately.

Journal ArticleDOI
TL;DR: The perceived phonetic distance of L1 and L2 sounds was found to predict learning effects in discrimination of L3 and L1 sounds in some cases, and the role of experience in learning sounds in a second language was investigated.
Abstract: This study reports the results of two experiments with native speakers of Japanese. In experiment 1, near-monolingual Japanese listeners participated in a cross-language mapping experiment in which they identified English and Japanese consonants in terms of a Japanese category, then rated the identifications for goodness-of-fit to that Japanese category. Experiment 2 used the same set of stimuli in a categorial discrimination test. Three groups of Japanese speakers varying in English-language experience, and one group of native English speakers participated. Contrast pairs composed of two English consonants, two Japanese consonants, and one English and one Japanese consonant were tested. The results indicated that the perceived phonetic distance of second language (L2) consonants from the closest first language (L1) consonant predicted the discrimination of L2 sounds. In addition, this study investigated the role of experience in learning sounds in a second language. Some of the consonant contrasts tested showed evidence of learning (i.e., significantly higher scores for the experienced than the relatively inexperienced Japanese groups). The perceived phonetic distance of L1 and L2 sounds was found to predict learning effects in discrimination of L1 and L2 sounds in some cases. The results are discussed in terms of models of cross-language speech perception and L2 phonetic learning.

Journal ArticleDOI
TL;DR: A method for evaluating the acoustical properties of homogeneous and isotropic porous materials that may be modeled as fluids having complex properties is described here and good agreement was found between the estimated acoustICAL properties and those predicted by using the formulas of Delany and Bazley.
Abstract: A method for evaluating the acoustical properties of homogeneous and isotropic porous materials that may be modeled as fluids having complex properties is described here. To implement the procedure, a conventional, two-microphone standing wave tube was modified to include: a new sample holder; a section downstream of the sample holder that accommodated a second pair of microphone holders and an approximately anechoic termination. Sound-pressure measurements at two upstream and two downstream locations were then used to estimate the two-by-two transfer matrix of porous material samples. The experimental transfer matrix method has been most widely used in the past to measure the acoustical properties of silencer system components. That procedure was made more efficient here by taking advantage of the reciprocal nature of sound transmission through homogeneous and isotropic porous layers. The transfer matrix of a homogeneous and isotropic, rigid or limp porous layer can easily be used to identify the material’s characteristic impedance and wave number, from which other acoustical quantities of interest can be calculated. The procedure has been used to estimate the acoustical properties of a glass fiber material: good agreement was found between the estimated acoustical properties and those predicted by using the formulas of Delany and Bazley.

Journal ArticleDOI
TL;DR: Radiated noise directionality measurements indicate that the radiation is generally dipole in form at lower frequencies, as expected, but there are some departures from this pattern that may indicate hull interactions.
Abstract: Extensive measurements were made of the radiated noise of M/V OVERSEAS HARRIETTE, a bulk cargo ship (length 173 m, displacement 25 515 tons) powered by a direct-drive low-speed diesel engine—a design representative of many modern merchant ships. The radiated noise data show high-level tonal frequencies from the ship’s service diesel generator, main engine firing rate, and blade rate harmonics due to propeller cavitation. Radiated noise directionality measurements indicate that the radiation is generally dipole in form at lower frequencies, as expected. There are some departures from this pattern that may indicate hull interactions. Blade rate source level (174 dB re 1 μPa/m at 9 Hz, 16 knots) agrees reasonably well with a model of fundamental blade rate radiation previously reported by Gray and Greeley, but agreement for blade rate harmonics is not as good. Noise from merchant ships elevates the natural ambient by 20–30 dB in many areas; the effects of this noise on the biological environment have not been widely investigated.

Journal ArticleDOI
TL;DR: Echolocation signals were recorded from big brown bats, Eptesicus fuscus, flying in the field and the laboratory, and in the terminal phase of insect capture sequences, where Fmin decreased with decreasing signal duration.
Abstract: Echolocation signals were recorded from big brown bats, Eptesicus fuscus, flying in the field and the laboratory. In open field areas the interpulse intervals (IPI) of search signals were either around 134 ms or twice that value, 270 ms. At long IPI's the signals were of long duration (14 to 18-20 ms), narrow bandwidth, and low frequency, sweeping down to a minimum frequency (Fmin) of 22-25 kHz. At short IPI's the signals were shorter (6-13 ms), of higher frequency, and broader bandwidth. In wooded areas only short (6-11 ms) relatively broadband search signals were emitted at a higher rate (avg. IPI= 122 ms) with higher Fmin (27-30 kHz). In the laboratory the IPI was even shorter (88 ms), the duration was 3-5 ms, and the Fmin 30- 35 kHz, resembling approach phase signals of field recordings. Excluding terminal phase signals, all signals from all areas showed a negative correlation between signal duration and Fmin, i.e., the shorter the signal, the higher was Fmin. This correlation was reversed in the terminal phase of insect capture sequences, where Fmin decreased with decreasing signal duration. Overall, the signals recorded in the field were longer, with longer IPI's and greater variability in bandwidth than signals recorded in the laboratory.

Journal ArticleDOI
TL;DR: A matrix formalism of the propagation operator is introduced to compare the time-reversal and inverse filter techniques and experiments investigated in various media are presented to illustrate this comparison.
Abstract: To focus ultrasonic waves in an unknown inhomogeneous medium using a phased array, one has to calculate the optimal set of signals to be applied on the transducers of the array. In the case of time-reversal mirrors, one assumes that a source is available at the focus, providing the Green’s function of this point. In this paper, the robustness of this time-reversal method is investigated when loss of information breaks the time-reversal invariance. It arises in dissipative media or when the field radiated by the source is not entirely measured by the limited aperture of a time-reversal mirror. However, in both cases, linearity and reciprocity relations ensure time reversal to achieve a spatiotemporal matched filtering. Nevertheless, though it provides robustness to this method, no constraints are imposed on the field out of the focus and sidelobes may appear. Another approach consists of measuring the Green’s functions associated to the focus but also to neighboring points. Thus, the whole information characterizing the medium is known and the inverse source problem can be solved. A matrix formalism of the propagation operator is introduced to compare the time-reversal and inverse filter techniques. Moreover, experiments investigated in various media are presented to illustrate this comparison.

PatentDOI
TL;DR: A method of tongue reduction by thermal ablation using high intensity focused ultrasound includes the steps of introducing an ultrasound emitting member in a patient's oral cavity, positioning the ultrasound emitted member adjacent an external surface of the tongue, and withdrawing the ultrasound emitting members from the oral cavity.
Abstract: A method of tonsil reduction by thermal ablation using high intensity focused ultrasound energy includes the steps of introducing an ultrasound emitting member in a patient's oral cavity, positioning the ultrasound emitting member adjacent an external surface of one of the patient's tonsils, emitting ultrasound energy from the ultrasound emitting member into the tissue of the one tonsil, focusing the ultrasound energy in the one tonsil, ablating the tissue of the one tonsil with the focused ultrasound energy to form an ablated tissue area in the one tonsil containing unablated tissue of the one tonsil and a plurality of lesions at which the tissue of the one tonsil is ablated, and withdrawing the ultrasound emitting member from the oral cavity. The ablated tissue area is surgically removed or is allowed to remain in the patient's body. The lesions may be absorbed by the patient's body and/or remain as altered tissue such that the tonsil is reduced in size to correspondingly increase the size of the patient's airway and/or is stiffened to resist vibration. The lesions can begin a predetermined distance beneath the external surface of the tonsil such that the mucosa is preserved. The lesions end at a predetermined depth so that muscular tissue is not damaged.

Journal ArticleDOI
TL;DR: The rising chirp may be of clinical use in assessing the integrity of the entire peripheral organ and not just its basal end, and is compatible with earlier experimental results from recordings of compound action potentials (CAP).
Abstract: This study examines auditory brainstem responses (ABR) elicited by rising frequency chirps. The time course of frequency change for the chirp theoretically produces simultaneous displacement maxima by compensating for travel-time differences along the cochlear partition. This broadband chirp was derived on the basis of a linear cochlea model [de Boer, “Auditory physics. Physical principles in hearing theory I,” Phys. Rep. 62, 87–174 (1980)]. Responses elicited by the broadband chirp show a larger wave-V amplitude than do click-evoked responses for most stimulation levels tested. This result is in contrast to the general hypothesis that the ABR is an electrophysiological event most effectively evoked by the onset or offset of an acoustic stimulus, and unaffected by further stimulation. The use of this rising frequency chirp enables the inclusion of activity from lower frequency regions, whereas with a click, synchrony is decreased in accordance with decreasing traveling velocity in the apical region. The use of a temporally reversed (falling) chirp leads to a further decrease in synchrony as reflected in ABR responses that are smaller than those from a click. These results are compatible with earlier experimental results from recordings of compound action potentials (CAP) [Shore and Nuttall, “High synchrony compound action potentials evoked by rising frequency-swept tonebursts,” J. Acoust. Soc. Am. 78, 1286–1295 (1985)] reflecting activity at the level of the auditory nerve. Since the ABR components considered here presumably reflect neural response from the brainstem, the effect of an optimized synchronization at the peripheral level can also be observed at the brainstem level. The rising chirp may therefore be of clinical use in assessing the integrity of the entire peripheral organ and not just its basal end.

Journal ArticleDOI
TL;DR: The cerebral magnetic field of the auditory steady-state response (SSR) to sinusoidal amplitude-modulated (SAM) tones was recorded in healthy humans and waveforms of underlying cortical source activity were calculated at multiples of the modulation frequency to improve the signal-to-noise ratio (SNR).
Abstract: The cerebral magnetic field of the auditory steady-state response (SSR) to sinusoidal amplitude-modulated (SAM) tones was recorded in healthy humans. The waveforms of underlying cortical source activity were calculated at multiples of the modulation frequency using the method of source space projection, which improved the signal-to-noise ratio (SNR) by a factor of 2 to 4. Since the complex amplitudes of the cortical source activity were independent of the sensor position in relation to the subject’s head, a comparison of the results across experimental sessions was possible. The effect of modulation frequency on the amplitude and phase of the SSR was investigated at 30 different values between 10 and 98 Hz. At modulation frequencies between 10 and 20 Hz the SNR of harmonics near 40 Hz were predominant over the fundamental SSR. Above 30 Hz the SSR showed an almost sinusoidal waveform with an amplitude maximum at 40 Hz. The amplitude decreased with increasing modulation frequency but was significantly different from the magnetoencephalographic (MEG) background activity up to 98 Hz. Phase response at the fundamental and first harmonic decreased monotonically with increasing modulation frequency. The group delay (apparent latency) showed peaks of 72 ms at 20 Hz, 48 ms at 40 Hz, and 26 ms at 80 Hz. The effects of stimulus intensity, modulation depth, and carrier frequency on amplitude and phase of the SSR were also investigated. The SSR amplitude decreased linearly when stimulus intensity or the modulation depth were decreased in logarithmic steps. SSR amplitude decreased by a factor of 3 when carrier frequency increased from 250 to 4000 Hz. From the phase characteristics, time delays were found in the range of 0 to 6 ms for stimulus intensity, modulation depth, and carrier frequency, which were maximal at low frequencies, low intensities, or maximal modulation depth.

Journal ArticleDOI
TL;DR: The fact that young children cannot recognize spectrally degraded speech as well as adults suggests that a long learning period is required for robust acoustic pattern recognition.
Abstract: Adult listeners are able to recognize speech even under conditions of severe spectral degradation. To assess the developmental time course of this robust pattern recognition, speech recognition was measured in two groups of children (5-7 and 10-12 years of age) as a function of the degree of spectral resolution. Results were compared to recognition performance of adults listening to the same materials and conditions. The spectral detail was systematically manipulated using a noise-band vocoder in which filtered noise bands were modulated by the amplitude envelope from the same spectral bands in speech. Performance scores between adults and older children did not differ statistically, whereas scores by younger children were significantly lower; they required more spectral resolution to perform at the same level as adults and older children. Part of the deficit in younger children was due to their inability to utilize fully the sensory information, and part was due to their incomplete linguistic/cognitive development. The fact that young children cannot recognize spectrally degraded speech as well as adults suggests that a long learning period is required for robust acoustic pattern recognition. These findings have implications for the application of auditory sensory devices for young children with early-onset hearing loss.

Journal ArticleDOI
TL;DR: The method described here, spectrogram correlation, is well-suited to recognition of animal sounds consisting of tones and frequency sweeps and could be especially useful for detecting a call type when relatively few instances of the call type are known.
Abstract: A method is described for the automatic recognition of transient animal sounds. Automatic recognition can be used in wild animal research, including studies of behavior, population, and impact of anthropogenic noise. The method described here, spectrogram correlation, is well-suited to recognition of animal sounds consisting of tones and frequency sweeps. For a sound type of interest, a two-dimensional synthetic kernel is constructed and cross-correlated with a spectrogram of a recording, producing a recognition function—the likelihood at each point in time that the sound type was present. A threshold is applied to this function to obtain discrete detection events, instants at which the sound type of interest was likely to be present. An extension of this method handles the temporal variation commonly present in animal sounds. Spectrogram correlation was compared to three other methods that have been used for automatic call recognition: matched filters, neural networks, and hidden Markov models. The test data set consisted of bowhead whale (Balaena mysticetus) end notes from songs recorded in Alaska in 1986 and 1988. The method had a success rate of about 97.5% on this problem, and the comparison indicated that it could be especially useful for detecting a call type when relatively few (5–200) instances of the call type are known.

PatentDOI
TL;DR: In this paper, a method for collecting data associated with the voice of a voice system user includes conducting a conversation with the user and capturing and digitizing a speech waveform of the user, extracting at least one acoustic feature from the digitized speech wave form and storing attribute data corresponding to the acoustic feature, together with an identifying indicia, in the data warehouse in a form to facilitate subsequent data mining.
Abstract: A method for collecting data associated with the voice of a voice system user includes conducting a conversation with the user, capturing and digitizing a speech waveform of the user, extracting at least one acoustic feature from the digitized speech waveform and storing attribute data corresponding to the acoustic feature, together with an identifying indicia, in the data warehouse in a form to facilitate subsequent data mining. User attributes can include gender, age, accent, native language, dialect, socioeconomic classification, educational level and emotional state. Data gathering can be repeated for a large number of users, until sufficient data is present. The attribute data to be stored can include raw acoustic features, or processed features, such as the user's emotional state, age, gender, socioeconomic group, and the like. In an alternative form of method, the user attribute can be used to real-time modify behavior of the voice system, with or without storage of data for subsequent data mining. An apparatus for collecting data associated with a voice of a user includes a dialog management unit, an audio capture module, an acoustic front end, a processing module and a data warehouse. The acoustic front end receives and digitizes a speech waveform from the user and extracts at least one acoustic feature from the digitized speech waveform. The feature is correlated with at least one user attribute. The processing module analyzes the acoustic feature to determine the user attribute, which can then be stored in the data warehouse. The dialog management unit can include, for example, a telephone interactive voice response system. The processor can be an application specific circuit, a separate general purpose computer with appropriate software, or a processor portion of the IVR. The processing module can include an emotional state classifier, a speaker clusterer and classifier, a speech recognizer, and/or an accent identifier. Alternatively, the apparatus can be configured as a real-time- modifiable voice system for interaction with a user, which can be used to practice the method for tailoring a voice system response.

Journal ArticleDOI
TL;DR: A finite-element model of the vocal fold that has provisions for asymmetry across the midplane, both from the geometric and tension point of view, which enables one to simulate certain kinds of voice disorders due to vocal-fold paralysis.
Abstract: A finite-element model of the vocal fold is developed from basic laws of continuum mechanics to obtain the oscillatory characteristics of the vocal folds. The model is capable of accommodating inhomogeneous, anisotropic material properties and irregular geometry of the boundaries. It has provisions for asymmetry across the midplane, both from the geometric and tension point of view, which enables one to simulate certain kinds of voice disorders due to vocal-fold paralysis. It employs the measured viscoelastic properties of the vocal-fold tissues. The detailed construction of the matrix differential equations of motion is presented followed by the solution scheme. Finally, typical results are presented and validated using an eigenvalue method and a commercial finite-element package (ABAQUS).

Journal ArticleDOI
TL;DR: A model based on a time-domain statement of causality is presented that describes observed power-law behavior of many viscoelastic materials and is compared to theories for loss mechanisms in dielectrics based on isolated polar molecules and cooperative interactions.
Abstract: Relaxation models fail to predict and explain loss characteristics of many viscoelastic materials which follow a frequency power law. A model based on a time-domain statement of causality is presented that describes observed power-law behavior of many viscoelastic materials. A Hooke’s law is derived from power-law loss characteristics; it reduces to the Hooke’s law for the Voigt model for the specific case of quadratic frequency loss. Broadband loss and velocity data for both longitudinal and shear elastic types of waves agree well with predictions. These acoustic loss models are compared to theories for loss mechanisms in dielectrics based on isolated polar molecules and cooperative interactions.

Journal ArticleDOI
TL;DR: The results suggest that fundamental frequency is controlled using auditory feedback and with reference to an internal pitch representation, consistent with current work on internal models of speech motor control.
Abstract: Hearing one’s own speech is important for language learning and maintenance of accurate articulation. For example, people with postlinguistically acquired deafness often show a gradual deterioration of many aspects of speech production. In this manuscript, data are presented that address the role played by acoustic feedback in the control of voice fundamental frequency (F0). Eighteen subjects produced vowels under a control (normal F0 feedback) and two experimental conditions: F0 shifted up and F0 shifted down. In each experimental condition subjects produced vowels during a training period in which their F0 was slowly shifted without their awareness. Following this exposure to transformed F0, their acoustic feedback was returned to normal. Two effects were observed. Subjects compensated for the change in F0 and showed negative aftereffects. When F0 feedback was returned to normal, the subjects modified their produced F0 in the opposite direction to the shift. The results suggest that fundamental frequency is controlled using auditory feedback and with reference to an internal pitch representation. This is consistent with current work on internal models of speech motor control.

Journal ArticleDOI
TL;DR: Differences between speaker groups emerged in segment durations, pausing behavior, and in the extent to which the SPL of [s] was affected, which was similar to those of increasing vocal effort.
Abstract: The acoustic effects of the adjustment in vocal effort that is required when the distance between speaker and addressee is varied over a large range (0.3–187.5 m) were investigated in phonated and, at shorter distances, also in whispered speech. Several characteristics were studied in the same sentence produced by men, women, and 7-year-old boys and girls: duration of vowels and consonants, pausing and occurrence of creaky voice, mean and range of F0, certain formant frequencies (F1 in [a] and F3), sound-pressure level (SPL) of voiced segments and [s], and spectral emphasis. In addition to levels and emphasis, vowel duration, F0, and F1 were substantially affected. “Vocal effort” was defined as the communication distance estimated by a group of listeners for each utterance. Most of the observed effects correlated better with this measure than with the actual distance, since some additional factors affected the speakers’ choice. Differences between speaker groups emerged in segment durations, pausing behav...

PatentDOI
TL;DR: In this paper, a malleable handle shaft is used to orient an active face of an ultrasound emitting member at a distal end of the shaft to contact anatomical tissue at a selected operative site.
Abstract: A focused ultrasound ablation device includes an ultrasound emitting member and a handle shaft having a distal end at which the ultrasound emitting member is disposed. The handle shaft is malleable to permit selective, manual shaping of the handle shaft to access a selected anatomical operative site from a remote location and/or to orient the ultrasound emitting member for contact with anatomical tissue at the selected operative site. A method of thermally ablating anatomical tissue includes the steps of manually shaping a malleable handle shaft to orient an active face of an ultrasound emitting member at a distal end of the shaft to contact anatomical tissue at a selected operative site, positioning the active face against the anatomical tissue at the operative site while a proximal end of the handle shaft is disposed at a remote location; emitting ultrasound energy from the ultrasound emitting member; focusing the ultrasound energy at one or more focusing zones within a target area in the tissue and located a predetermined distance in front of the active face and heating the tissue at the target area with the focused ultrasound energy to create a lesion.

Journal ArticleDOI
TL;DR: Three experimental paradigms were used to specify the auditory system's frequency selectivity for amplitude modulation (AM) using an envelope power-spectrum model (EPSM) which integrates the envelope power of the modulation masker in the passband of a modulation filter tuned to the signal-modulation frequency.
Abstract: Three experimental paradigms were used to specify the auditory system's frequency selectivity for amplitude modulation (AM). In the first experiment, masked-threshold patterns were obtained for signal-modulation frequencies of 4, 16, 64, and 256 Hz in the presence of a half-octave-wide modulation masker, both applied to the same noise carrier with a bandwidth ranging from 1 to 4 kHz. In the second experiment, psychophysical tuning curves (PTCs) were obtained for signal-modulation frequencies of 16 and 64 Hz imposed on a noise carrier as in the first experiment. In the third experiment, masked thresholds for signal-modulation frequencies of 8, 16, 32, and 64 Hz were obtained according to the "classical" band-widening paradigm, where the bandwidth of the modulation masker ranged from 1/8 to 4 octaves, geometrically centered on the signal frequency. The first two experiments allowed a direct derivation of the shape of the modulation filters while the latter paradigm only provided an indirect estimate of the filter bandwidth. Thresholds from the experiments were predicted on the basis of an envelope power-spectrum model (EPSM) which integrates the envelope power of the modulation masker in the passband of a modulation filter tuned to the signal-modulation frequency. The Q-value of second-order bandpass modulation filters was fitted to the masking patterns from the first experiment using a least-squares algorithm. Q-values of about 1 for frequencies up to 64 Hz suggest an even weaker selectivity for modulation than assumed in earlier studies. The same model also accounted reasonably well for the shape of the temporal modulation transfer function (TMTF) obtained for carrier bandwidths in the range from 1 to 6000 Hz. Peripheral filtering and effects of peripheral compression were also investigated using a multi-channel version of the model. Waveform compression did not influence the simulated results. Peripheral bandpass filtering only influenced thresholds for high modulation frequencies when signal information was strongly attenuated by the transfer function of the peripheral filters.

PatentDOI
TL;DR: In this article, a flexible elongate body with proximal (14) and distal (12) ends is used for tissue ablation, and a plurality of spaced-apart electrodes are attached to the flexible body near the distal end.
Abstract: The present invention provides ultrasound-guided ablation catheters and methods for their use. In one embodiment, a tissue ablation apparatus (2) includes a flexible elongate body (12) having proximal (14) and distal (12) ends. A plurality of spaced-apart electrodes (24) are operably attached to the flexible body near the distal end. A plurality of transducer elements (28) are disposed between at least some of the electrodes. Transducers assist the physician in determining whether or not the ablation elements are in contact with the tissue to be ablated.