scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 2005"


Journal ArticleDOI
TL;DR: A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system and provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound.
Abstract: A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system The model provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound Simplified, more specifically tailored versions of this model have already been validated by successful application in the assessment of speech intelligibility [Elhilali et al, Speech Commun 41(2-3), 331-348 (2003); Chi et al, J Acoust Soc Am 106, 2719-2732 (1999)] and in explaining the perception of monaural phase sensitivity [R Carlyon and S Shamma, J Acoust Soc Am 114, 333-348 (2003)] Here we provide a more complete mathematical formulation of the model, illustrating how complex signals are transformed through various stages of the model, and relating it to comparable existing models of auditory processing Furthermore, we outline several reconstruction algorithms to resynthesize the sound from the model output so as to evaluate the fidelity of the representation and contribution of different features and cues to the sound percept

635 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a model applicable to ultrasound contrast agent bubbles that takes into account the physical properties of a lipid monolayer coating on a gas microbubble, including buckling radius, the compressibility of the shell, and a break-up shell tension.
Abstract: We present a model applicable to ultrasound contrast agent bubbles that takes into account the physical properties of a lipid monolayer coating on a gas microbubble Three parameters describe the properties of the shell: a buckling radius, the compressibility of the shell, and a break-up shell tension The model presents an original non-linear behavior at large amplitude oscillations, termed compression-only, induced by the buckling of the lipid monolayer This prediction is validated by experimental recordings with the high-speed camera Brandaris 128, operated at several millions of frames per second The effect of aging, or the resultant of repeated acoustic pressure pulses on bubbles, is predicted by the model It corrects a flaw in the shell elasticity term previously used in the dynamical equation for coated bubbles The break-up is modeled by a critical shell tension above which gas is directly exposed to water

579 citations


Journal ArticleDOI
TL;DR: The theory behind the demonstration that the Green's function between two points could be recovered using the cross-correlation function of the ambient noise measured at these two points is investigated in the simple case of a homogeneous medium with attenuation.
Abstract: It has been experimentally demonstrated that the Green’s function between two points could be recovered using the cross-correlation function of the ambient noise measured at these two points. This paper investigates the theory behind this result in the simple case of a homogeneous medium with attenuation.

410 citations


PatentDOI
TL;DR: In this paper, a wireless audio distribution system has a wireless transmitter, responsive to a plurality of audio input channels, for transmitting a encoded digital bitstream serially combining each of the audio input channel, including control data disbursed therein.
Abstract: A wireless audio distribution system having a wireless transmitter, responsive to a plurality of audio input channels, for transmitting a encoded digital bitstream serially combining each of the audio input channel, the encoded digital bitstream further including control data disbursed therein, a receiver, responsive to the transmitted encoded digital bitstream, for decoding and demultiplexing the digital bitstream, a manual selector switch, connected to the receiver device for selecting one or more of the audio input channels to be reproduced, and a sound producing device for selectively reproducing the one or more selected audio channels in accordance with the control data.

400 citations


Journal ArticleDOI
TL;DR: A method is presented by which the wavenumbers for a one-dimensional waveguide can be predicted from a finite element (FE) model, which involves postprocessing a conventional, but low order, FE model, the mass and stiffness matrices of which are typically found using a conventional FE package.
Abstract: A method is presented by which the wavenumbers for a one-dimensional waveguide can be predicted from a finite element (FE) model. The method involves postprocessing a conventional, but low order, FE model, the mass and stiffness matrices of which are typically found using a conventional FE package. This is in contrast to the most popular previous waveguide/FE approach, sometimes termed the spectral finite element approach, which requires new spectral element matrices to be developed. In the approach described here, a section of the waveguide is modeled using conventional FE software and the dynamic stiffness matrix formed. A periodicity condition is applied, the wavenumbers following from the eigensolution of the resulting transfer matrix. The method is described, estimation of wavenumbers, energy, and group velocity discussed, and numerical examples presented. These concern wave propagation in a beam and a simply supported plate strip, for which analytical solutions exist, and the more complex case of a viscoelastic laminate, which involves postprocessing an ANSYS FE model. The method is seen to yield accurate results for the wavenumbers and group velocities of both propagating and evanescent waves.

400 citations


Journal ArticleDOI
TL;DR: A database covering seven dialects of British and Irish English and three different styles of speech was explored to find acoustic correlates of prominence, and fundamental frequency (f0) played a minor role in distinguishing prominent syllables from the rest of the utterance.
Abstract: We explored a database covering seven dialects of British and Irish English and three different styles of speech to find acoustic correlates of prominence. We built classifiers, trained the classifiers on human prominence/nonprominence judgments, and then evaluated how well they behaved. The classifiers operate on 452 ms windows centered on syllables, using different acoustic measures. By comparing the performance of classifiers based on different measures, we can learn how prominence is expressed in speech. Contrary to textbooks and common assumption, fundamental frequency (f0) played a minor role in distinguishing prominent syllables from the rest of the utterance. Instead, speakers primarily marked prominence with patterns of loudness and duration. Two other acoustic measures that we examined also played a minor role, comparable to f0. All dialects and speaking styles studied here share a common definition of prominence. The result is robust to differences in labeling practice and the dialect of the labeler.

379 citations


Journal ArticleDOI
TL;DR: The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone.
Abstract: Speech recognition in noise and music perception is especially challenging for current cochlear implant users. The present study utilizes the residual acoustic hearing in the nonimplanted ear in five cochlear implant users to elucidate the role of temporal fine structure at low frequencies in auditory perception and to test the hypothesis that combined acoustic and electric hearing produces better performance than either mode alone. The first experiment measured speech recognition in the presence of competing noise. It was found that, although the residual low-frequency (<1000 Hz) acoustic hearing produced essentially no recognition for speech recognition in noise, it significantly enhanced performance when combined with the electric hearing. The second experiment measured melody recognition in the same group of subjects and found that, contrary to the speech recognition result, the low-frequency acoustic hearing produced significantly better performance than the electric hearing. It is hypothesized that listeners with combined acoustic and electric hearing might use the correlation between the salient pitch in low-frequency acoustic hearing and the weak pitch in the envelope to enhance segregation between signal and noise. The present study suggests the importance and urgency of accurately encoding the fine-structure cue in cochlear implants.

364 citations


Journal ArticleDOI
TL;DR: Data gathered at various hospitals over the last 45 years indicate a trend of increasing noise levels during daytime and nighttime hours, and no location is in compliance with current World Health Organization Guidelines.
Abstract: This article presents the results of a noise survey at Johns Hopkins Hospital in Baltimore, MD. Results include equivalent sound pressure levels ( L eq ) as a function of location, frequency, and time of day. At all locations and all times of day, the L eq indicate that a serious problem exists. No location is in compliance with current World Health Organization Guidelines, and a review of objective data indicates that this is true of hospitals throughout the world. Average equivalent sound levels are in the 50 – 60 dB ( A ) range for 1 min , 1 2 , and 24 h averaging time periods. The spectra are generally flat over the 63 – 2000 Hz octave bands, with higher sound levels at lower frequencies, and a gradual roll off above 2000 Hz . Many units exhibit little if any reduction of sound levels in the nighttime. Data gathered at various hospitals over the last 45 years indicate a trend of increasing noise levels during daytime and nighttime hours. The implications of these results are significant for patients, visitors, and hospital staff.

354 citations


Journal ArticleDOI
TL;DR: Zero-thickness interface models are developed to describe the encapsulation of microbubble contrast agents with rheological parameters such as surface tension, surface Dilatational viscosity, and surface dilatational elasticity to characterize a widely used microbubbles based ultrasound contrast agent.
Abstract: Zero-thickness interface models are developed to describe the encapsulation of microbubble contrast agents. Two different rheological models of the interface, Newtonian (viscous) and viscoelastic, with rheological parameters such as surface tension, surface dilatational viscosity, and surface dilatational elasticity are presented to characterize the encapsulation. The models are applied to characterize a widely used microbubble based ultrasound contrast agent. Attenuation of ultrasound passing through a solution of contrast agent is measured. The model parameters for the contrast agent are determined by matching the linearized model dynamics with measured attenuation data. The models are investigated for its ability to match with other experiments. Specifically, model predictions are compared with scattered fundamental and subharmonic responses. Experiments and model prediction results are discussed along with those obtained using an existing model [Church, J. Acoust. Soc. Am. 97, 1510 (1995) and Hoff et al., J. Acoust. Soc. Am. 107, 2272 (2000)] of contrast agents.

269 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe how to effectively measure, model, design and apply diffusers and absorbers and provide an overview of the evolution, characteristics and application of modern diffusers.
Abstract: Absorbers and diffusers are two of the main design tools for altering the acoustic conditions of rooms, semi-enclosed spaces and the outdoor environment. Their correct use is important for delivering high quality acoustics. Unique and authoritative, this book decribes how to effectively measure, model, design and apply diffusers and absorbers. It is a resource for new and experienced acousticians, seeking an understanding of the evolution, characteristics and application of modern diffusers. Absorption is a more established technology and so the book blends traditional designs with modern developments. The book covers practical and theoretical aspects of absorbers and diffusers and is well illustrated with examples of installations and case studies. This new edition brings Acoustic Absorbers and Diffusers up-to-date with current research, practice and standards. New developments in measurement, materials, theory and practice since the first edition (published in 2004) are included. The sections on absorbers are extended to include more about noise control.

267 citations


Journal ArticleDOI
TL;DR: The results of the current experiments suggest that the focus of attention along the spatial dimension can play a very significant role in solving the "cocktail party" problem.
Abstract: This study examined the role of focused attention along the spatial (azimuthal) dimension in a highly uncertain multitalker listening situation The task of the listener was to identify key words from a target talker in the presence of two other talkers simultaneously uttering similar sentences When the listener had no a priori knowledge about target location, or which of the three sentences was the target sentence, performance was relatively poor—near the value expected simply from choosing to focus attention on only one of the three locations When the target sentence was cued before the trial, but location was uncertain, performance improved significantly relative to the uncued case When spatial location information was provided before the trial, performance improved significantly for both cued and uncued conditions If the location of the target was certain, proportion correct identification performance was higher than 09 independent of whether the target was cued beforehand In contrast to studies in which known versus unknown spatial locations were compared for relatively simple stimuli and tasks, the results of the current experiments suggest that the focus of attention along the spatial dimension can play a very significant role in solving the “cocktail party” problem

Journal ArticleDOI
TL;DR: The findings indicate that the vowel systems of American English are better characterized in terms of the region of origin of the talkers than in Terms of a single set of idealized acoustic-phonetic baselines of "General" American English.
Abstract: Previous research by speech scientists on the acoustic characteristics of American English vowel systems has typically focused on a single regional variety, despite decades of sociolinguistic research demonstrating the extent of regional phonological variation in the United States. In the present study, acoustic measures of duration and first and second formant frequencies were obtained from five repetitions of 11 different vowels produced by 48 talkers representing both genders and six regional varieties of American English. Results revealed consistent variation due to region of origin, particularly with respect to the production of low vowels and high back vowels. The Northern talkers produced shifted low vowels consistent with the Northern Cities Chain Shift, the Southern talkers produced fronted back vowels consistent with the Southern Vowel Shift, and the New England, Midland, and Western talkers produced the low back vowel merger. These findings indicate that the vowel systems of American English are better characterized in terms of the region of origin of the talkers than in terms of a single set of idealized acoustic-phonetic baselines of “General” American English and provide benchmark data for six regional varieties.

Journal ArticleDOI
TL;DR: An extension to the SII model is proposed with the aim to predict the speech intelligibility in both stationary and fluctuating noise, and can give a good account for speech reception threshold (SRT) data from the literature.
Abstract: The SII model in its present form (ANSI S3.5-1997, American National Standards Institute, New York) can accurately describe intelligibility for speech in stationary noise but fails to do so for nonstationary noise maskers. Here, an extension to the SII model is proposed with the aim to predict the speech intelligibility in both stationary and fluctuating noise. The basic principle of the present approach is that both speech and noise signal are partitioned into small time frames. Within each time frame the conventional SII is determined, yielding the speech information available to the listener at that time frame. Next, the SII values of these time frames are averaged, resulting in the SII for that particular condition. Using speech reception threshold (SRT) data from the literature, the extension to the present SII model can give a good account for SRTs in stationary noise, fluctuating speech noise, interrupted noise, and multiple-talker noise. The predictions for sinusoidally intensity modulated (SIM) noise and real speech or speech-like maskers are better than with the original SII model, but are still not accurate. For the latter type of maskers, informational masking may play a role.

PatentDOI
TL;DR: In this paper, a method for low-frequency emphasizing the spectrum of a sound signal transformed in a frequency domain and comprising transform coefficients grouped in a number of blocks, in which a maximum energy for one block is calculated and a position index of the block with maximum energy is determined, a factor is calculated for each block having a position Index smaller than the position Index of the Block with maximum Energy, and for each blocks a gain is determined from the factor and is applied to the transform coefficients of the blocks.
Abstract: An aspect of the present invention relates to a method for low-frequency emphasizing the spectrum of a sound signal transformed in a frequency domain and comprising transform coefficients grouped in a number of blocks, in which a maximum energy for one block is calculated and a position index of the block with maximum energy is determined, a factor is calculated for each block having a position index smaller than the position index of the block with maximum energy, and for each block a gain is determined from the factor and is applied to the transform coefficients of the block.

Journal ArticleDOI
TL;DR: Qu quantitative characteristics of clicks from deep-diving Cuvier's beaked whales (Ziphius cavirostris) are reported using a unique data set and the potential for passive detection is enhanced.
Abstract: Strandings of beaked whales of the genera Ziphius and Mesoplodon have been reported to occur in conjunction with naval sonar use. Detection of the sounds from these elusive whales could reduce the risk of exposure, but descriptions of their vocalizations are at best incomplete. This paper reports quantitative characteristics of clicks from deep-diving Cuvier’s beaked whales (Ziphius cavirostris) using a unique data set. Two whales in the Ligurian Sea were simultaneously tagged with sound and orientation recording tags, and the dive tracks were reconstructed allowing for derivation of the range and relative aspect between the clicking whales. At depth, the whales produced trains of regular echolocation clicks with mean interclick intervals of 0.43 s (±0.09) and 0.40 s (±0.07). The clicks are frequency modulated pulses with durations of ∼200 μs and center frequencies around 42 kHz, −10 dB bandwidths of 22 kHz, and Q3 dB of 4. The sound beam is narrow with an estimated directionality index of more than 25 dB...

Journal ArticleDOI
TL;DR: Listeners presented with carefully controlled synthetic tones use attack time, spectral centroid, and spectrum fine structure in dissimilarity rating experiments, and spectral flux appears as a less salient timbre parameter, its salience depending on the number of other dimensions varying concurrently in the stimulus set.
Abstract: Timbre spaces represent the organization of perceptual distances, as measured with dissimilarity ratings, among tones equated for pitch, loudness, and perceived duration. A number of potential acoustic correlates of timbre-space dimensions have been proposed in the psychoacoustic literature, including attack time, spectral centroid, spectral flux, and spectrum fine structure. The experiments reported here were designed as direct tests of the perceptual relevance of these acoustical parameters for timbre dissimilarity judgments. Listeners presented with carefully controlled synthetic tones use attack time, spectral centroid, and spectrum fine structure in dissimilarity rating experiments. These parameters thus appear as major determinants of timbre. However, spectral flux appears as a less salient timbre parameter, its salience depending on the number of other dimensions varying concurrently in the stimulus set. Dissimilarity ratings were analyzed with two different multidimensional scaling models (CLASCAL and CONSCAL), the latter providing psychophysical functions constrained by the physical parameters. Their complementarity is discussed.

Journal ArticleDOI
TL;DR: The results indicate that the degree of spectral peakresolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1-2 ripples /octave may result in highly degraded speech recognition.
Abstract: Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1-2 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition.

PatentDOI
Fuliang Weng1, Qi Zhang1
TL;DR: In this paper, an advanced model that includes new processes is provided for use as a component of an effective disfluency identifier, which tags edited words in transcribed speech, and combines a speech recognition unit in combination with a part-of-speech tagger, a disfluence identifier, and a parser.
Abstract: An advanced model that includes new processes is provided for use as a component of an effective disfluency identifier. The disfluency identifier tags edited words in transcribed speech. A speech recognition unit in combination with a part-of-speech tagger, a disfluency identifier, and a parser form a natural language system that helps machines properly interpret spoken utterances.

Journal ArticleDOI
TL;DR: The results suggest that both category assimilation and perceptual interference affect English /r/ and /l/ acquisition.
Abstract: Recent work [Iverson et al. (2003) Cognition, 87, B47–57] has suggested that Japanese adults have difficulty learning English /r/ and /l/ because they are overly sensitive to acoustic cues that are not reliable for /r/-/l/ categorization (e.g., F2 frequency). This study investigated whether cue weightings are altered by auditory training, and compared the effectiveness of different training techniques. Separate groups of subjects received High Variability Phonetic Training (natural words from multiple talkers), and 3 techniques in which the natural recordings were altered via signal processing (All Enhancement, with F3 contrast maximized and closure duration lengthened; Perceptual Fading, with F3 enhancement reduced during training; and Secondary Cue Variability, with variation in F2 and durations increased during training). The results demonstrated that all of the training techniques improved /r/-/l/ identification by Japanese listeners, but there were no differences between the techniques. Training also altered the use of secondary acoustic cues; listeners became biased to identify stimuli as English /l/ when the cues made them similar to the Japanese /r/ category, and reduced their use of secondary acoustic cues for stimuli that were dissimilar to Japanese /r/. The results suggest that both category assimilation and perceptual interference affect English /r/ and /l/ acquisition.

Journal ArticleDOI
TL;DR: To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.
Abstract: This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis. Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.

Journal ArticleDOI
TL;DR: Quantitative analyses were undertaken to test the relationship between speaker body size and voice F0 and formant frequencies for human vowels on the vowel-like grunts of baboons, finding that F0 variation did not track body-size variation between the sexes in either species, nor within sexes in humans.
Abstract: Key voice features—fundamental frequency (F0) and formant frequencies—can vary extensively between individuals. Much of the variation can be traced to differences in the size of the larynx and vocal-tract cavities, but whether these differences in turn simply reflect differences in speaker body size (i.e., neutral vocal allometry) remains unclear. Quantitative analyses were therefore undertaken to test the relationship between speaker body size and voice F0 and formant frequencies for human vowels. To test the taxonomic generality of the relationships, the same analyses were conducted on the vowel-like grunts of baboons, whose phylogenetic proximity to humans and similar vocal production biology and voice acoustic patterns recommend them for such comparative research. For adults of both species, males were larger than females and had lower mean voice F0 and formant frequencies. However, beyond this, F0 variation did not track body-size variation between the sexes in either species, nor within sexes in humans. In humans, formant variation correlated significantly with speaker height but only in males and not in females. Implications for general vocal allometry are discussed as are implications for speech origins theories, and challenges to them, related to laryngeal position and vocal tract length.

Journal ArticleDOI
TL;DR: The experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.
Abstract: There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.

Journal ArticleDOI
TL;DR: Judgements of speaker size show that VTL has a strong influence upon perceived speaker size, and that vowels with abnormal combinations of low GPRs and short VTLs, the VTL information appears to decide the sex/age judgement.
Abstract: Glottal-pulse rate (GPR) and vocal-tract length (VTL) are related to the size, sex, and age of the speaker but it is not clear how the two factors combine to influence our perception of speaker size, sex, and age. This paper describes experiments designed to measure the effect of the interaction of GPR and VTL upon judgements of speaker size, sex, and age. Vowels were scaled to represent people with a wide range of GPRs and VTLs, including many well beyond the normal range of the population, and listeners were asked to judge the size and sex/age of the speaker. The judgements of speaker size show that VTL has a strong influence upon perceived speaker size. The results for the sex and age categorization (man, woman, boy, or girl) show that, for vowels with GPR and VTL values in the normal range, judgements of speaker sex and age are influenced about equally by GPR and VTL. For vowels with abnormal combinations of low GPRs and short VTLs, the VTL information appears to decide the sex/age judgement.

Journal ArticleDOI
TL;DR: Observations are presented from seven diverse materials showing that anomalous nonlinear fast dynamics (ANFD) and slow dynamics (SD) occur together, significantly expanding the nonlinear mesoscopic elasticity class.
Abstract: Results are reported of the first systematic study of anomalous nonlinear fast dynamics and slow dynamics in a number of solids. Observations are presented from seven diverse materials showing that anomalous nonlinear fast dynamics (ANFD) and slow dynamics (SD) occur together, significantly expanding the nonlinear mesoscopic elasticity class. The materials include samples of gray iron, alumina ceramic, quartzite, cracked Pyrex, marble, sintered metal, and perovskite ceramic. In addition, it is shown that materials which exhibit ANFD have very similar ratios of amplitude-dependent internal-friction to the resonance-frequency shift with strain amplitude. The ratios range between 0.28 and 0.63, except for cracked Pyrex glass, which exhibits a ratio of 1.1, and the ratio appears to be a material characteristic. The ratio of internal friction to resonance frequency shift as a function of time during SD is time independent, ranging from 0.23 to 0.43 for the materials studied.

Journal ArticleDOI
TL;DR: The results support the hypothesis that a cyclic variation of the orifice profile from a convergent to a divergent shape leads to a temporal asymmetry in the average wall pressure, which is the key factor for the achievement of self-sustained vocal fold oscillations.
Abstract: The aerodynamic transfer of energy from glottal airflow to vocal fold tissue during phonation was explored using complementary synthetic and numerical vocal fold models. The synthetic model was fabricated using a flexible polyurethane rubber compound. The model size, shape, and material properties were generally similar to corresponding human vocal fold characteristics. Regular, self-sustained oscillations were achieved at a frequency of approximately 120 Hz. The onset pressure was approximately 1.2 kPa. A corresponding two-dimensional finite element model was developed using geometry definitions and material properties based on the synthetic model. The finite element model upstream and downstream pressure boundary conditions were based on experimental values acquired using the synthetic model. An analysis of the fully coupled fluid and solid numerical domains included flow separation and unsteady effects. The numerical results provided detailed flow data that was used to investigate aerodynamic energy transfer mechanisms. The results support the hypothesis that a cyclic variation of the orifice profile from a convergent to a divergent shape leads to a temporal asymmetry in the average wall pressure, which is the key factor for the achievement of self-sustained vocal fold oscillations. me rica.

Journal ArticleDOI
TL;DR: The results demonstrated that both spectral and temporal cues were important for consonant and vowel recognition with the spectral cues having a greater effect than the temporal cues for the ranges of numbers of channels and lowpass cutoff frequencies tested.
Abstract: Cochlear implants provide users with limited spectral and temporal information In this study, the amount of spectral and temporal information was systematically varied through simulations of cochlear implant processors using a noise-excited vocoder Spectral information was controlled by varying the number of channels between 1 and 16, and temporal information was controlled by varying the lowpass cutoff frequencies of the envelope extractors from 1 to 512 Hz Consonants and vowels processed using those conditions were presented to seven normal-hearing native-English-speaking listeners for identification The results demonstrated that both spectral and temporal cues were important for consonant and vowel recognition with the spectral cues having a greater effect than the temporal cues for the ranges of numbers of channels and lowpass cutoff frequencies tested The lowpass cutoff for asymptotic performance in consonant and vowel recognition was 16 and 4 Hz, respectively The number of channels at which performance plateaued for consonants and vowels was 8 and 12, respectively Within the above-mentioned ranges of lowpass cutoff frequency and number of channels, the temporal and spectral cues showed a tradeoff for phoneme recognition Information transfer analyses showed different relative contributions of spectral and temporal cues in the perception of various phonetic/acoustic features

Journal ArticleDOI
TL;DR: Results suggest that bias and variability in sound localization behavior may vary systematically with listener location in a room as well as source location relative to the listener, even for nearby sources where there is relatively little reverberant energy.
Abstract: Binaural room impulse responses (BRIRs) were measured in a classroom for sources at different azimuths and distances (up to 1 m) relative to a manikin located in four positions in a classroom. When the listener is far from all walls, reverberant energy distorts signal magnitude and phase independently at each frequency, altering monaural spectral cues, interaural phase differences, and interaural level differences. For the tested conditions, systematic distortion (comb-filtering) from an early intense reflection is only evident when a listener is very close to a wall, and then only in the ear facing the wall. Especially for a nearby source, interaural cues grow less reliable with increasing source laterality and monaural spectral cues are less reliable in the ear farther from the sound source. Reverberation reduces the magnitude of interaural level differences at all frequencies; however, the direct-sound interaural time difference can still be recovered from the BRIRs measured in these experiments. Results suggest that bias and variability in sound localization behavior may vary systematically with listener location in a room as well as source location relative to the listener, even for nearby sources where there is relatively little reverberant energy.

Journal ArticleDOI
TL;DR: This study examines the growth pattern of the various hard and soft tissue vocal tract structures as visualized by magnetic resonance imaging (MRI), and assesses their relational growth with vocal tract length (VTL).
Abstract: Speech development in children is predicated partly on the growth and anatomic restructuring of the vocal tract. This study examines the growth pattern of the various hard and soft tissue vocal tract structures as visualized by magnetic resonance imaging (MRI), and assesses their relational growth with vocal tract length (VTL). Measurements on lip thickness, hard- and soft-palate length, tongue length, naso-oro-pharyngeal length, mandibular length and depth, and distance of the hyoid bone and larynx from the posterior nasal spine were used from 63 pediatric cases (ages birth to 6 years and 9 months) and 12 adults. Results indicate (a) ongoing growth of all oral and pharyngeal vocal tract structures with no sexual dimorphism, and a period of accelerated growth between birth and 18 months; (b) vocal tract structure’s region (oral/anterior versus pharyngeal/posterior) and orientation (horizontal versus vertical) determine its growth pattern; and (c) the relational growth of the different structures with VTL changes with development—while the increase in VTL throughout development is predominantly due to growth of pharyngeal/posterior structures, VTL is also substantially affected by the growth of oral/anterior structures during the first 18 months of life. Findings provide normative data that can be used for modeling the development of the vocal tract.

Journal ArticleDOI
TL;DR: It is shown that the existing diffuse-field reciprocity relationship can be extended to encompass connections that possess an arbitrary number of degrees of freedom.
Abstract: This analysis is concerned with the derivation of a “diffuse field” reciprocity relationship between the diffuse field excitation of a connection to a structural or acoustic subsystem and the radiation impedance of the connection. Such a relationship has been derived previously for connections described by a single degree of freedom. In the present work it is shown that the diffuse–field reciprocity relationship also arises when describing the ensemble average response of connections to structural or acoustic subsystems with uncertain boundaries. Furthermore, it is shown that the existing diffuse–field reciprocity relationship can be extended to encompass connections that possess an arbitrary number of degrees of freedom. The present work has application to (i) the calculation of the diffuse field response of structural–acoustic systems modeled by Finite Elements, Boundary Elements, and Infinite Elements; (ii) the general calculation of the Coupling Loss Factors employed in Statistical Energy Analysis (SEA); and (iii) the derivation of an alternative analysis method for describing the dynamic interactions of coupled subsystems with uncertain boundaries (a generalized “boundary” approach to SEA).

Journal ArticleDOI
TL;DR: The transmission properties of bone conducted sound in human head are presented, measured as the three-dimensional vibration at the cochlear promontory in six intact cadaver heads, found to be nondispersive at frequencies above 2 kHz whereas it altered with frequency at the cranial vault.
Abstract: In the past, only a few investigations have measured vibration at the cochlea with bone conduction stimulation: dry skulls were used in those investigations In this paper, the transmission properties of bone conducted sound in human head are presented, measured as the three-dimensional vibration at the cochlear promontory in six intact cadaver heads The stimulation was provided at 27 positions on the skull surface and two close to the cochlea; mechanical point impedance was measured at all positions Cochlear promontory vibration levels in the three perpendicular directions were normally within 5 dB With the stimulation applied on the ipsilateral side, the response decreased, and the accumulated phase increased, with distance between the cochlea and the excitation position No significant changes were obtained when the excitations were on the contralateral side In terms of vibration level, the best stimulation position is on the mastoid close to the cochlea; the worst is at the midline of the skull The transcranial transmission was close to 0 dB for frequencies up to 700 Hz; above it decreased at 12 dB/decade Wave transmission at the skull-base was found to be nondispersive at frequencies above 2 kHz whereas it altered with frequency at the cranial vault (c) 2005 Acoustical Society of America