scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 1999"


PatentDOI
TL;DR: In this article, a 3D tracking and imaging system is used to carry out a medical procedure using a catheter, probe, sensor, pacemaker lead, needle, or the like, and the position of the surgical instrument is tracked as it moves through a medium in a bodily structure.
Abstract: A method for carrying out a medical procedure using a 3-D tracking and imaging system (1600). A surgical instrument, such as a catheter, probe, sensor, pacemaker lead, needle, or the like is inserted into a living being, and the position of the surgical instrument is tracked as it moves through a medium in a bodily structure. The location of the surgical instrument relative to its immediate surroundings is displayed to improve a physician's ability to precisely position the surgical instrument. The medical procedures including targeted drug delivery, sewing sutures, removal of an obstruction from the circulatory system, a biopsy, amniocentesis, brain surgery, measurement of cervical dilation, evaluation of knee stability, assessment of myocardial contractibility, eye surgery, prostate surgery, trans-myocardial revascularization (TMR), robotic surgery, and evaluation of RF transmissions.

959 citations


PatentDOI
Scott D. Wampler1
TL;DR: In this paper, an ultrasonic surgical device for the application of ultrasonic energy is disclosed, which has a housing and an acoustic assembly having a solid core waveguide, and two novel end effectors are described having an embedding surface at the distal end, and a coagulating surface extending from the embedding surfaces.
Abstract: An ultrasonic surgical device for the application of ultrasonic energy is disclosed. The surgical device has a housing and an acoustic assembly having a solid core waveguide. The waveguide extends from the housing and has a novel end effector at the distal end for the conduction of ultrasonic energy thereto. Two novel end effectors are described having an embedding surface at the distal end, and a coagulating surface extending from the embedding surface. The first novel end effector has a cylindrical shaft, an embedding surface, and an angled coagulating surface. The second novel end effector has a truncated cone having a circumferential coagulating surface and a distal embedding surface.

799 citations


Journal ArticleDOI
TL;DR: The results confirm that the reduction in magnitude and within-subject variability of both temporal and spectral acoustic parameters with age is a major trend associated with speech development in normal children, and support the hypothesis of uniform axial growth of the vocal tract for male speakers.
Abstract: Changes in magnitude and variability of duration, fundamental frequency, formant frequencies, and spectral envelope of children’s speech are investigated as a function of age and gender using data obtained from 436 children, ages 5 to 17 years, and 56 adults. The results confirm that the reduction in magnitude and within-subject variability of both temporal and spectral acoustic parameters with age is a major trend associated with speech development in normal children. Between ages 9 and 12, both magnitude and variability of segmental durations decrease significantly and rapidly, converging to adult levels around age 12. Within-subject fundamental frequency and formant-frequency variability, however, may reach adult range about 2 or 3 years later. Differentiation of male and female fundamental frequency and formant frequency patterns begins at around age 11, becoming fully established around age 15. During that time period, changes in vowel formant frequencies of male speakers is approximately linear with...

764 citations


Journal ArticleDOI
TL;DR: The purpose of this review is to provide a framework within which to describe the effects of precedence and to help in the integration of data from both psychophysical and physiological experiments, and it is probably only through the combined efforts of these fields that a full theory of precedence will evolve and useful models will be developed.
Abstract: In a reverberant environment, sounds reach the ears through several paths. Although the direct sound is followed by multiple reflections, which would be audible in isolation, the first-arriving wavefront dominates many aspects of perception. The “precedence effect” refers to a group of phenomena that are thought to be involved in resolving competition for perception and localization between a direct sound and a reflection. This article is divided into five major sections. First, it begins with a review of recent work on psychoacoustics, which divides the phenomena into measurements of fusion, localization dominance, and discrimination suppression. Second, buildup of precedence and breakdown of precedence are discussed. Third measurements in several animal species, developmental changes in humans, and animal studies are described. Fourth, recent physiological measurements that might be helpful in providing a fuller understanding of precedence effects are reviewed. Fifth, a number of psychophysical models a...

744 citations


Journal ArticleDOI
TL;DR: Findings have implications for speech recognition, speech forensics, and the evolution of the human speech production system, and provide a normative standard for future studies of human vocal tract morphology and development.
Abstract: Magnetic resonance imaging was used to quantify the vocal tract morphology of 129 normal humans, aged 2‐25 years. Morphometric data, including midsagittal vocal tract length, shape, and proportions, were collected using computer graphic techniques. There was a significant positive correlation between vocal tract length and body size ~either height or weight!. The data also reveal clear differences in male and female vocal tract morphology, including changes in overall vocal tract length and the relative proportions of the oral and pharyngeal cavity. These sex differences are not evident in children, but arise at puberty, suggesting that they are part of the vocal remodeling process that occurs during puberty in males. These findings have implications for speech recognition, speech forensics, and the evolution of the human speech production system, and provide a normative standard for future studies of human vocal tract morphology and development. © 1999 Acoustical Society of America.@S0001-4966~99!02008-1#

734 citations


Journal ArticleDOI
TL;DR: Two broad classes of emissions--reflection-source and distortion-source emissions--are distinguished based on the mechanisms of their generation, and the implications of this OAE taxonomy for the measurement, interpretation, and clinical use of otoacoustic emissions as noninvasive probes of cochlear function are discussed.
Abstract: Otoacoustic emissions (OAEs) of all types are widely assumed to arise by a common mechanism: nonlinear electromechanical distortion within the cochlea. In this view, both stimulus-frequency (SFOAEs) and distortion-product emissions (DPOAEs) arise because nonlinearities in the mechanics act as "sources" of backward-traveling waves. This unified picture is tested by analyzing measurements of emission phase using a simple phenomenological description of the nonlinear re-emission process. The analysis framework is independent of the detailed form of the emission sources and the nonlinearities that produce them. The analysis demonstrates that the common assumption that SFOAEs originate by nonlinear distortion requires that SFOAE phase be essentially independent of frequency, in striking contradiction with experiment. This contradiction implies that evoked otoacoustic emissions arise by two fundamentally different mechanisms within the cochlea. These two mechanisms (linear reflection versus nonlinear distortion) are described and two broad classes of emissions--reflection-source and distortion-source emissions--are distinguished based on the mechanisms of their generation. The implications of this OAE taxonomy for the measurement, interpretation, and clinical use of otoacoustic emissions as noninvasive probes of cochlear function are discussed.

664 citations


PatentDOI
TL;DR: In this paper, a system and method for interacting with a computer using utterances, speech processing and natural language processing is presented, which consists of a speech processor for searching a first grammar file for a matching phrase for the utterance, and for searching another grammar file if the matching phrase is not found in the first parser file.
Abstract: A system and method for interacting with a computer using utterances, speech processing and natural language processing. The system comprises a speech processor for searching a first grammar file for a matching phrase for the utterance, and for searching a second grammar file for the matching phrase if the matching phrase is not found in the first grammar file. The system also includes a natural language processor for searching a database for a matching entry for the matching phrase; and an application interface for performing an action associated with the matching entry if the matching entry is found in the database. The system utilizes context-specific grammars, thereby enhancing speech recognition and natural language processing efficiency. Additionally, the system adaptively and interactively 'learns' words and phrases, and their associated meanings.

530 citations


Journal ArticleDOI
TL;DR: This study investigated whether the distinct and separate localization of speech and interference provides any perceptual advantage that, due to the precedence effect, is not degraded by reflections.
Abstract: Spatial separation of speech and noise in an anechoic space creates a release from masking that often improves speech intelligibility. However, the masking release is severely reduced in reverberant spaces. This study investigated whether the distinct and separate localization of speech and interference provides any perceptual advantage that, due to the precedence effect, is not degraded by reflections. Listeners’ identification of nonsense sentences spoken by a female talker was measured in the presence of either speech-spectrum noise or other sentences spoken by a second female talker. Target and interference stimuli were presented in an anechoic chamber from loudspeakers directly in front and 60 degrees to the right in single-source and precedence-effect (lead-lag) conditions. For speech-spectrum noise, the spatial separation advantage for speech recognition (8 dB) was predictable from articulation index computations based on measured release from masking for narrow-band stimuli. The spatial separation advantage was only 1 dB in the lead-lag condition, despite the fact that a large perceptual separation was produced by the precedence effect. For the female talker interference, a much larger advantage occurred, apparently because informational masking was reduced by differences in perceived locations of target and interference.

485 citations


Journal ArticleDOI
TL;DR: Using the high-variability paradigm, eight American learners of Mandarin were trained in eight sessions during the course of two weeks to identify the four tones in natural words produced by native Mandarin talkers, and the analogies between L2 acquisition processes at the segmental and suprasegmental levels are discussed.
Abstract: Auditory training has been shown to be effective in the identification of non-native segmental distinctions. In this study, it was investigated whether such training is applicable to the acquisition of non-native suprasegmental contrasts, i.c., Mandarin tones. Using the high-variability paradigm, eight American learners of Mandarin were trained in eight sessions during the course of two weeks to identify the four tones in natural words produced by native Mandarin talkers. The trainees' identification accuracy revealed an average 21% increase from the pretest to the post-test, and the improvement gained in training was generalized to new stimuli (18% increase) and to new talkers and stimuli (25% increase). Moreover, the six-month retention test showed that the improvement was retained long after training by an average 21% increase from the pretest. The results are discussed in terms of non-native suprasegmental perceptual modification, and the analogies between L2 acquisition processes at the segmental and suprasegmental levels.

458 citations


Journal ArticleDOI
TL;DR: The later in life the native Italian subjects began to learn English, the less accurately they produced and perceived English vowels, which is consistent with the hypothesis of the speech learning model that early bilinguals establish new categories for vowels found in the second language (L2).
Abstract: This study examined the production and perception of English vowels by highly experienced native Italian speakers of English. The subjects were selected on the basis of the age at which they arrived in Canada and began to learn English, and how much they continued to use Italian. Vowel production accuracy was assessed through an intelligibility test in which native English-speaking listeners attempted to identify vowels spoken by the native Italian subjects. Vowel perception was assessed using a categorial discrimination test. The later in life the native Italian subjects began to learn English, the less accurately they produced and perceived English vowels. Neither of two groups of early Italian/English bilinguals differed significantly from native speakers of English either for production or perception. This finding is consistent with the hypothesis of the speech learning model [Flege, in Speech Perception and Linguistic Experience: Theoretical and Methodological Issues (York, Timonium, MD, 1995)] that early bilinguals establish new categories for vowels found in the second language (L2). The significant correlation observed to exist between the measures of L2 vowel production and perception is consistent with another hypothesis of the speech learning model, viz., that the accuracy with which L2 vowels are produced is limited by how accurately they are perceived.

416 citations


Journal ArticleDOI
TL;DR: The experiments reported here support Wallach's hypothesis and suggest further that head movements are not required to produce the dynamic cues needed to resolve front-back ambiguity.
Abstract: Normally, the apparent position of a sound source corresponds closely to its actual position. However, in some experimental situations listeners make large errors, such as indicating that a source in the frontal hemifield appears to be in the rear hemifield, or vice versa. These front–back confusions are thought to be a result of the inherent ambiguity of the primary interaural difference cues, interaural time difference (ITD) in particular. A given ITD could have been produced by a sound source anywhere on the so-called “cone of confusion.” More than 50 years ago Wallach [J. Exp. Psychol. 27, 339–368 (1940)] argued that small head movements could provide the information necessary to resolve the ambiguity. The direction of the change in ITD that accompanies a head rotation is an unambiguous indicator of the proper hemifield. The experiments reported here are a modern test of Wallach’s hypothesis. Listeners indicated the apparent positions of real and virtual sound sources in conditions in which head movem...

PatentDOI
TL;DR: A real-time speech-based learning/training system distributed between client and server, and incorporating speech recognition and linguistic processing for recognizing a spoken question and to provide an answer to the student in a learning or training environment implemented on an intranet or over the Internet, is disclosed.
Abstract: A real-time speech-based learning/training system distributed between client and server, and incorporating speech recognition and linguistic processing for recognizing a spoken question and to provide an answer to the student in a learning or training environment implemented on an intranet or over the Internet, is disclosed. The system accepts the student's question in the form of speech at his or her computer, PDA or workstation where minimal processing extracts a sufficient number of acoustic speech vectors representing the utterance. The system as implemented accepts environmental variables such as course, chapter, section as selected by the user so that the search time, accuracy and response time for the question can be optimized. A minimum set of acoustic vectors extracted at the client are then sent via a communications channel to the server where additional acoustic vectors are derived. Using Hidden Markov Models (HMMs), and appropriate grammars and dictionaries conditioned by the course, chapter and section selections made by the student, the speech representing the user's query is fully decoding to text at the server. This text corresponding to the user's query is then simultaneously sent to a natural language engine and a database processor where an optimized SQL statement is constructed for a full-text search from a SQL database for a recordset of several stored questions that best matches the user's query. Further processing in the natural language engine narrows the search down to a single stored question. The answer that is paired to this single stored question is then retrieved from the file path and sent to the student computer in compressed form. At the student's computer, the answer is articulated using a text-to-speech engine in his or her native natural language. The system requires no training and can operate in several natural languages.

Journal ArticleDOI
TL;DR: The results suggest that binaural cues play an important role in auditory distance perception for nearby sources and that the interaural level difference increases substantially for lateral sources as distance decreases below 1 m, even at low frequencies where the ILD is small for distant sources.
Abstract: Although researchers have long recognized the unique properties of the head-related transfer function (HRTF) for nearby sources (within 1 m of the listener’s head), virtually all of the HRTF measurements described in the literature have focused on source locations 1 m or farther from the listener. In this study, HRTFs for sources at distances from 0.12 to 1 m were calculated using a rigid-sphere model of the head and measured using a Knowles Electronic Manikin for Acoustic Research (KEMAR) and an acoustic point source. Both the calculations and the measurements indicate that the interaural level difference (ILD) increases substantially for lateral sources as distance decreases below 1 m, even at low frequencies where the ILD is small for distant sources. In contrast, the interaural time delay (ITD) is roughly independent of distance even when the source is close. The KEMAR measurements indicate that the direction of the source relative to the outer ear plays an important role in determining the high-frequency response of the HRTF in the horizontal plane. However, the elevation-dependent characteristics of the HRTFs are not strongly dependent on distance, and the contribution of the pinna to the HRTF is independent of distance beyond a few centimeters from the ear. Overall, the results suggest that binaural cues play an important role in auditory distance perception for nearby sources.

Patent
TL;DR: A catheter (30) is a method for treating a target region in a body lumen comprising directing a uniform dose of ultrasonic energy from an interior of the lumen, wherein the dosage of ultrasound energy received at any one point along the length varies by no more than plus or minus 6 decibels as discussed by the authors.
Abstract: A catheter (30), and method for treating a target region in a body lumen comprising directing a uniform dose of ultrasonic energy from an interior of the lumen, wherein the dosage of ultrasonic energy received at any one point along the length varies by no more than plus or minus 6 decibels. One catheter (30) has an array of transducers (20) which emit ultrasound energy in directions (D1, D2).

Journal ArticleDOI
TL;DR: In this article, the effect of demographic variables (sex, age, education level, occupational status, size of household, homeownership, dependency on the noise source, and use of noise source) and two attitudinal variables (noise sensitivity and fear of the noise sources) on noise annoyance was investigated.
Abstract: The effect of demographic variables (sex, age, education level, occupational status, size of household, homeownership, dependency on the noise source, and use of the noise source) and two attitudinal variables (noise sensitivity and fear of the noise source) on noise annoyance is investigated. It is found that fear and noise sensitivity have a large impact on annoyance (DNL equivalent equal to [at most] 19 and 11 dB, respectively). Demographic factors are much less important. Noise annoyance is not related to gender, but age has an effect (DNL equivalent equal to 5 dB). The effects of the other demographic factors on noise annoyance are (very) small, i.e., the equivalent DNL difference is equal to 1-2 dB, and, in the case of dependency, 3 dB. The results are based on analyses of the original data from various previous field surveys of response to noise from transportation sources (number of cases depending on the variable between 15 000 and 42000).

PatentDOI
TL;DR: In this paper, the authors proposed to reverse the ultrasound image processing steps such as focal and depth gain compensation, dynamic range compression, intensity or color mapping, and various filtering, such as persistence or spatial filtering.
Abstract: Ultrasound data is generated by a receive beamformer. Ultrasound image processing is applied to the ultrasound data for presentation of an image. Various ones of the ultrasound image processing steps may be reversed. For example, persistence processing may be reversed in order to obtain ultrasound data associated with data prior to persistence processing. This recovered data may be used to generate an image or for application of a different amount of persistence. Other processes that may be reversed to recover ultrasound data include focal and depth gain compensation, dynamic range compression, intensity or color mapping, and various filtering, such as persistence or spatial filtering.

Journal ArticleDOI
TL;DR: A method is described to select sentence materials for efficient measurement of the speech reception threshold (SRT), and the result is a set of 1272 sentences, where every sentence has been uttered by two male and two female speakers.
Abstract: Due to technological advancements, modern hearing aids may have many adjustable parameters, multiple memories, and the ability to house all sorts of signal‐processing algorithms. To enable a systematic evaluation of the speech intelligibility for a variety of hearing‐aid settings, large sets of speech materials are required. This paper reports on the creation and evaluation of a set of 1272 sentences uttered by two male and two female speakers. Two subsets were formed (one for a male speaker and one for a female speaker) to enable efficient measurement of the speech reception threshold in stationary speech‐shaped noise. Each subset consists of 39 lists, each comprising 13 sentences. The properties of the new subsets are comparable to the existing sets that are used in clinical practice. [Work supported by the Heinsius Houbolt foundation.]

Journal ArticleDOI
TL;DR: A computational auditory model is presented that exhibits spectro-temporal MTFs consistent with the salient trends in the data and is used to demonstrate the potential relevance of these M TFs to the assessment of speech intelligibility in noise and reverberant conditions.
Abstract: Detection thresholds for spectral and temporal modulations are measured using broadband spectra with sinusoidally rippled profiles that drift up or down the log-frequency axis at constant velocities. Spectro-temporal modulation transfer functions (MTFs) are derived as a function of ripple peak density (Ω cycles/octave) and drifting velocity (ω Hz). The MTFs exhibit a low-pass function with respect to both dimensions, with 50% bandwidths of about 16 Hz and 2 cycles/octave. The data replicate (as special cases) previously measured purely temporal MTFs (Ω=0) [Viemeister, J. Acoust. Soc. Am. 66, 1364–1380 (1979)] and purely spectral MTFs (ω=0) [Green, in Auditory Frequency Selectivity (Plenum, Cambridge, 1986), pp. 351–359]. A computational auditory model is presented that exhibits spectro-temporal MTFs consistent with the salient trends in the data. The model is used to demonstrate the potential relevance of these MTFs to the assessment of speech intelligibility in noise and reverberant conditions.

PatentDOI
TL;DR: A high quality speech synthesizer in various embodiments concatenates speech waveforms referenced by a large speech database as mentioned in this paper, which is further improved by speech unit selection and concatenation smoothing.
Abstract: A high quality speech synthesizer in various embodiments concatenates speech waveforms referenced by a large speech database. Speech quality is further improved by speech unit selection and concatenation smoothing.

PatentDOI
TL;DR: In this article, an ultrasonic therapeutic apparatus consisting of a therapeutic ultrasonic wave generating source driven by a driver circuit was used to obtain a tissue tomographic image in the vicinity of the focus of the therapeutic ultrasound waves.
Abstract: An ultrasonic therapeutic apparatus consisting of a therapeutic ultrasonic wave generating source driven by a driver circuit to generate therapeutic ultrasonic waves, an in vivo imaging probe so as to obtain a tissue tomographic image in the vicinity of the focus of the therapeutic ultrasonic waves. The imaging probe is used to receive echoes of the ultrasonic pulses emitted from therapeutic ultrasonic wave generating source. The driving conditions for the therapeutic ultrasonic wave generating source is adjusted on the basis of a received echo signal. The received echo signal contains information about actual intensity of the therapeutic ultrasonic waves within a living body, thus improving the safety and reliability of therapy.

PatentDOI
TL;DR: In this paper, a transducer having an operative surface is disposed substantially adjacent to the wound to emit ultrasound to propagate in the direction of the wound, to promote wound healing, where reflections of the ultrasound by bone tissue, by skin layers, or by internally disposed reflective media propagate toward the wound as longitudinal waves, with shear waves generated by the longitudinal waves for the healing of wound.
Abstract: A portable therapeutic device and method of use generate longitudinally propagating ultrasound and shear waves generated by such longitudinally propagating ultrasound to provide effective healing of wounds. A transducer having an operative surface is disposed substantially adjacent to the wound to emit ultrasound to propagate in the direction of the wound to promote healing. Reflections of the ultrasound by bone tissue, by skin layers, or by internally disposed reflective media propagate toward the wound as longitudinal waves, with shear waves generated by the longitudinal waves for the healing of the wound. A focusing element is used for focusing the propagation of the ultrasound at a predetermined angle toward the wound. The operative surface of the transducer may be annularly shaped to encircle the wound to convey the ultrasound and/or reflected ultrasound thereto. A housing may be provided for positioning the transducer near a portion of the skin near the wound, and for indenting the skin to form a cavity, with the transducer disposed in the cavity to emit the ultrasound toward an internal surface of the wound. Fixture structures, such as adjustable straps, may extend about a portion of the body to position the transducer near the wound.

PatentDOI
TL;DR: This paper proposed a method for providing a guide for a user in a language translation system, which comprises the steps of receiving (302) an input that is representative of at least one word in a source language, and generating (1402)-at least one recognition hypothesis in the source language in response to the input.
Abstract: In a language translation system, a method for providing a guide for a user. In one embodiment, the method comprises the steps of receiving (302) an input that is representative of at least one word in a source language, and generating (1402) at least one recognition hypothesis in the source language in response to the input. The method further comprises the steps of selecting (406) a best hypothesis from the at least one recognition hypothesis in the source language, presenting (408) the best hypothesis in the source language to a user, and presenting (408) alternatives to a portion of the best hypothesis in the source language to the user. The method further includes receiving (410) an indication of a choice of one of the alternatives from the user, and presenting (306) a revised version of the best hypothesis including the alternative chosen to the user.

PatentDOI
TL;DR: A computer is used to perform recorded actions that permits the user to indicate that the user has reviewed one or more actions, and automatically carries out the actions indicated as having been reviewed by the user.
Abstract: A computer is used to perform recorded actions. The computer receives recorded spoken utterances of actions. The computer then performs speech recognition on the recorded spoken utterances to generate texts of the actions. The computer then parses the texts to determine properties of the actions. After parsing the texts, permits the user to indicate that the user has reviewed one or more actions. The computer then automatically carries out the actions indicated as having been reviewed by the user.

Patent
TL;DR: In this paper, a relatively low power automatic speech recognition system (ASR) is provided in the terminal for recognizing those portions of user-supplied audio input that relate to terminal functions or functions defined by a predefined markup language.
Abstract: Voice control of a service application provided to a terminal from a remote server is distributed between the terminal and a remote application part. A relatively low power automatic speech recognition system (ASR) is provided in the terminal for recognizing those portions of user-supplied audio input that relate to terminal functions or functions defined by a predefined markup language. Recognized words may be used to control the terminal functions, or may alternatively be converted to text and forwarded to the remote server. Unrecognized portions of the audio input may be encoded and forwarded to the remote application part which includes a more powerful ASR. The remote application part may use its ASR to recognize words defined by the application. Recognized words may be converted to text and supplied as input to the remote server. In the reverse direction, text received by the remote application part from the remote server may be converted to an encoded audio output signal, and forwarded to the terminal, which can then generate a signal to be supplied to a loudspeaker. In this way, a voice control mechanism is used in place of the remote server's visual display output and keyboard input.

Journal ArticleDOI
TL;DR: It is shown in this article that the measurements of velocity as well as attenuation are subjected to biases and by using a low-frequency transient excitation, the precise numerical values of elasticity and viscosity can be deduced.
Abstract: Several methods have been proposed to estimate the viscoelastic properties of soft biological tissues using forced low-frequency vibrations (10-500 Hz). Those methods are based on the measurement of phase velocity of the shear waves (approximately 5 m/s). It is shown in this article that the measurements of velocity as well as attenuation are subjected to biases. These biases are related to reflected waves created at boundaries, to the nonnegligible size of the piston source which causes diffraction effects and to the influence of a low-frequency compressional wave. Indeed, a theoretical analysis of the field radiated by a point source explains how mechanical vibrations of a piston generate a shear wave with a longitudinal component and how this component can interfere with a low-frequency compressional wave. However, by using a low-frequency transient excitation, these biases can be avoided. Then the precise numerical values of elasticity and viscosity can be deduced. Experiments in phantoms and beef muscles are shown. Moreover, a relative hardness imaging of a phantom composed of two media with different elasticities is presented.

Journal ArticleDOI
TL;DR: Results showed that the ability to take advantage of surface phonetic information, such as a consistent talker across items, is a perceptual skill that transfers easily from first to second language perception, but non-native listeners had particular difficulty with lexically hard words even when familiarity with the items was controlled, suggesting that non- native word recognition may be compromised when fine phonetic discrimination at the segmental level is required.
Abstract: In order to gain insight into the interplay between the talker-, listener-, and item-related factors that influence speech perception, a large multi-talker database of digitally recorded spoken words was developed, and was then submitted to intelligibility tests with multiple listeners. Ten talkers produced two lists of words at three speaking rates. One list contained lexically "easy" words (words with few phonetically similar sounding "neighbors" with which they could be confused), and the other list contained lexically "hard" words (words with many phonetically similar sounding "neighbors"). An analysis of the intelligibility data obtained with native speakers of English (experiment 1) showed a strong effect of lexical similarity. Easy words had higher intelligibility scores than hard words. A strong effect of speaking rate was also found whereby slow and medium rate words had higher intelligibility scores than fast rate words. Finally, a relationship was also observed between the various stimulus factors whereby the perceptual difficulties imposed by one factor, such as a hard word spoken at a fast rate, could be overcome by the advantage gained through the listener's experience and familiarity with the speech of a particular talker. In experiment 2, the investigation was extended to another listener population, namely, non-native listeners. Results showed that the ability to take advantage of surface phonetic information, such as a consistent talker across items, is a perceptual skill that transfers easily from first to second language perception. However, non-native listeners had particular difficulty with lexically hard words even when familiarity with the items was controlled, suggesting that non-native word recognition may be compromised when fine phonetic discrimination at the segmental level is required. Taken together, the results of this study provide insight into the signal-dependent and signal-independent factors that influence spoken language processing in native and non-native listeners.

Journal ArticleDOI
TL;DR: The optimal scale factor between any pair of subjects correlated highly with the ratios of subjects' maximum interaural delays, sizes of their external ears, and widths of their heads, so as to minimized inter-subject spectral differences.
Abstract: This study examined inter-subject differences in the transfer functions from the free field to the human ear canal, which are commonly know as head-related transfer functions. The directional components of such transfer functions are referred here to as directional transfer functions (DTFs). The DTFs of 45 subjects varied systematically among subjects in regard to the frequencies of spectral features such as peaks and notches. Inter-subject spectral differences in DTFs were quantified between 3.7 and 12.9 kHz for sound-source directions throughout the coordinate sphere. For each pair of subjects, an optimal frequency scale factor aligned spectral features between subjects and, thus, minimized inter-subject spectral differences. Frequency scaling of DTFs reduced spectral differences by a median value of 15.5% across all pairs of subjects and by more than half in 9.5% of subject pairs. Optimal scale factors showed a median value of 1.061 and a maximum of 1.38. The optimal scale factor between any pair of subjects correlated highly with the ratios of subjects’ maximum interaural delays, sizes of their external ears, and widths of their heads.

Journal ArticleDOI
TL;DR: After just nine 20-min sessions of connected discourse tracking with the shifted simulation, performance improved significantly for the identification of intervocalic consonants, medial vowels in monosyllables, and words in sentences; listeners were able to track connected discourse of shifted signals without lipreading at rates up to 40 words per minute.
Abstract: Multi-channel cochlear implants typically present spectral information to the wrong “place” in the auditory nerve array, because electrodes can only be inserted partway into the cochlea. Although such spectral shifts are known to cause large immediate decrements in performance in simulations, the extent to which listeners can adapt to such shifts has yet to be investigated. Here, the effects of a four-channel implant in normal listeners have been simulated, and performance tested with unshifted spectral information and with the equivalent of a 6.5-mm basalward shift on the basilar membrane (1.3–2.9 octaves, depending on frequency). As expected, the unshifted simulation led to relatively high levels of mean performance (e.g., 64% of words in sentences correctly identified) whereas the shifted simulation led to very poor results (e.g., 1% of words). However, after just nine 20-min sessions of connected discourse tracking with the shifted simulation, performance improved significantly for the identification of intervocalic consonants, medial vowels in monosyllables, and words in sentences (30% of words). Also, listeners were able to track connected discourse of shifted signals without lipreading at rates up to 40 words per minute. Although we do not know if complete adaptation to the shifted signals is possible, it is clear that short-term experiments seriously exaggerate the long-term consequences of such spectral shifts.

PatentDOI
TL;DR: A system for medical ultrasound in which the ultrasound probe is positioned by a robot arm under the shared control of the ultrasound operator and the computer is proposed.
Abstract: A system for medical ultrasound in which the ultrasound probe is positioned by a robot arm under the shared control of the ultrasound operator and the computer is proposed. The system comprises a robot arm design suitable for diagnostic ultrasound, a passive or active hand-controller, and a computer system to co-ordinate the motion and forces of the robot and hand-controller as a function of operator input, sensed parameters and ultrasound images.

Journal ArticleDOI
TL;DR: The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences and the new methodology proposed appears to be well suited to study language discrimination.
Abstract: This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm, or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm, and intonation (condition 1), rhythm and intonation (condition 2), intonation only (condition 3), or rhythm only (condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered.