scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 2007"


Journal ArticleDOI
TL;DR: This article reviews Springer Handbook of Speech Processing by Jacob Benesty, Mohan M. Sondhi, Yiteng Huang, 2008, which aims to provide a “roadmap” for the rapidly evolving field of speech processing.
Abstract: This article reviews Springer Handbook of Speech Processing by Jacob Benesty, Mohan M. Sondhi, Yiteng Huang , 2008. 1176 pp. Price: $199.00 (hardcover). ISBN: 978-3-540-49125-5

540 citations


Journal ArticleDOI
TL;DR: Van Trees and Bell as discussed by the authors reviewed Bayesian bounds for Parameter Estimation and Nonlinear Filtering/Tracking by Harry L. Van Trees and Kristine L. Bell.
Abstract: This article reviews Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/Tracking by Harry L. Van Trees, Kristine L. Bell , Piscataway, New Jersey, 2007. 951 pp. $111.00 (paperback), ISBN: 0470120959

498 citations


Journal ArticleDOI
TL;DR: SWIPE('), a variation of SWIPE, utilizes only the first and prime harmonics of the signal, which significantly reduces subharmonic errors commonly found in other pitch estimation algorithms.
Abstract: A Sawtooth Waveform Inspired Pitch Estimator (SWIPE) has been developed for processing speech and music. SWIPE is shown to outperform existing algorithms on several publicly available speech/musical-instruments databases and a disordered speech database. SWIPE estimates the pitch as the fundamental frequency of the sawtooth waveform whose spectrum best matches the spectrum of the input signal. A decaying cosine kernel provides an extension to older frequency-based, sieve-type estimation algorithms by providing smooth peaks with decaying amplitudes to correlate with the harmonics of the signal. An improvement on the algorithm is achieved by using only the first and prime harmonics, which significantly reduces subharmonic errors commonly found in other pitch estimation algorithms.

414 citations


Journal ArticleDOI
TL;DR: The Handbook of Noise and Vibration Control by Malcolm J. Crocker as discussed by the authors, New Jersey, 2007 1584 pp. Price: $195.00 (hardcover) ISBN: 0471395994
Abstract: This article reviews Handbook of Noise and Vibration Control by Malcolm J. Crocker , New Jersey, 2007 1584 pp. Price: $195.00 (hardcover) ISBN: 0471395994

332 citations


Journal ArticleDOI
TL;DR: The relation between speaker acuity and amount of compensation to auditory perturbation is mediated by the size of speakers' auditory goal regions, with more acute speakers having smaller goal regions.
Abstract: The role of auditory feedback in speech motor control was explored in three related experiments. Experiment 1 investigated auditory sensorimotor adaptation: the process by which speakers alter their speech production to compensate for perturbations of auditory feedback. When the first formant frequency (F1) was shifted in the feedback heard by subjects as they produced vowels in consonant-vowel-consonant (CVC) words, the subjects’ vowels demonstrated compensatory formant shifts that were maintained when auditory feedback was subsequently masked by noise—evidence of adaptation. Experiment 2 investigated auditory discrimination of synthetic vowel stimuli differing in F1 frequency, using the same subjects. Those with more acute F1 discrimination had compensated more to F1 perturbation. Experiment 3 consisted of simulations with the directions into velocities of articulators model of speech motor planning, which showed that the model can account for key aspects of compensation. In the model, movement goals fo...

274 citations


Journal ArticleDOI
TL;DR: This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach, and reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal.
Abstract: The "bag-of-frames" approach (BOF) to audio pattern recognition represents signals as the long-term statistical distribution of their local spectral features. This approach has proved nearly optimal for simulating the auditory perception of natural and human environments (or soundscapes), and is also the most predominent paradigm to extract high-level descriptions from music signals. However, recent studies show that, contrary to its application to soundscape signals, BOF only provides limited performance when applied to polyphonic music signals. This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach. First, the application of the same measure of acoustic similarity on both soundscape and music data sets confirms that the BOF approach can model soundscapes to near-perfect precision, and exhibits none of the limitations observed in the music data set. Second, the modification of this measure by two custom homogeneity transforms reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal. Such differences may explain the uneven performance of BOF algorithms on soundscapes and music signals, and suggest that their human perception rely on cognitive processes of a different nature.

251 citations


Journal ArticleDOI
TL;DR: Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.
Abstract: The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.

251 citations


Journal ArticleDOI
TL;DR: Results suggest that native and non-native listeners apply similar strategies for speech-in-noise perception: the crucial difference is in the signal clarity required for contextual information to be effective, rather than in an inability of non- native listeners to take advantage of this contextual information per se.
Abstract: Previous research has shown that speech recognition differences between native and proficient non-native listeners emerge under suboptimal conditions. Current evidence has suggested that the key deficit that underlies this disproportionate effect of unfavorable listening conditions for non-native listeners is their less effective use of compensatory information at higher levels of processing to recover from information loss at the phoneme identification level. The present study investigated whether this non-native disadvantage could be overcome if enhancements at various levels of processing were presented in combination. Native and non-native listeners were presented with English sentences in which the final word varied in predictability and which were produced in either plain or clear speech. Results showed that, relative to the low-predictability-plain-speech baseline condition, non-native listener final word recognition improved only when both semantic and acoustic enhancements were available (high-predictability-clear-speech). In contrast, the native listeners benefited from each source of enhancement separately and in combination. These results suggests that native and non-native listeners apply similar strategies for speech-in-noise perception: The crucial difference is in the signal clarity required for contextual information to be effective, rather than in an inability of non-native listeners to take advantage of this contextual information per se.

249 citations


PatentDOI
Kazuo Sumita1
TL;DR: In this article, a speech recognition system includes a first-candidate selecting unit that selects a recognition result of a first speech from first recognition candidates based on likelihood of the first recognition candidate.
Abstract: A speech recognition apparatus includes a first-candidate selecting unit that selects a recognition result of a first speech from first recognition candidates based on likelihood of the first recognition candidates; a second-candidate selecting unit that extracts recognition candidates of a object word contained in the first speech and recognition candidates of a clue word from second recognition candidates, acquires the relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, and selects a recognition result of the second speech based on the acquired relevance ratio; a correction-portion identifying unit that identifies a portion corresponding to the object word in the first speech; and a correcting unit that corrects the word on identified portion.

249 citations


PatentDOI
TL;DR: A personalized audio system and method that overcomes many of the broadcast-type disadvantages associated with conventional radio stations is proposed in this article, which is based on the idea of personalized audio systems.
Abstract: A personalized audio system and method that overcomes many of the broadcast-type disadvantages associated with conventional radio stations.

241 citations


Journal ArticleDOI
TL;DR: Evidence of a behavioral change in sound production of right whales that is correlated with increased noise levels is provided and it is indicated that right whales may shift call frequency to compensate for increased band-limited background noise.
Abstract: The impact of anthropogenic noise on marine mammals has been an area of increasing concern over the past two decades. Most low-frequency anthropogenic noise in the ocean comes from commercial shipping which has contributed to an increase in ocean background noise over the past 150 years. The long-term impacts of these changes on marine mammals are not well understood. This paper describes both short- and long-term behavioral changes in calls produced by the endangered North Atlantic right whale (Eubalaena glacialis) and South Atlantic right whale (Eubalaena australis) in the presence of increased low-frequency noise. Right whales produce calls with a higher average fundamental frequency and they call at a lower rate in high noise conditions, possibly in response to masking from low-frequency noise. The long-term changes have occurred within the known lifespan of individual whales, indicating that a behavioral change, rather than selective pressure, has resulted in the observed differences. This study provides evidence of a behavioral change in sound production of right whales that is correlated with increased noise levels and indicates that right whales may shift call frequency to compensate for increased band-limited background noise.

Journal ArticleDOI
TL;DR: Martin this paper reviewed multiple scattering, Interaction of Time-Harmonic Waves with N Obstacles by P. A. Martin. 450 pp. Price: $140.00 (hardcover). ISBN: 0-521-86554-9
Abstract: This article reviews Multiple Scattering, Interaction of Time-Harmonic Waves with N Obstacles by P. A. Martin , 2006. 450 pp. Price: $140.00 (hardcover). ISBN: 0-521-86554-9

Journal ArticleDOI
TL;DR: These findings demonstrate informational masking on sentence-in-noise recognition in the form of "linguistic interference" at the lexical, sublexical, and/or prosodic levels of linguistic structure and whether it is modulated by the phonetic similarity between the target and noise languages remains to be determined.
Abstract: Studies of speech perception in various types of background noise have shown that noise with linguistic content affects listeners differently than nonlinguistic noise [e.g., Simpson, S. A., and Cooke, M. (2005). "Consonant identification in N-talker babble is a nonmonotonic function of N," J. Acoust. Soc. Am. 118, 2775-2778; Sperry, J. L., Wiley, T. L., and Chial, M. R. (1997). "Word recognition performance in various background competitors," J. Am. Acad. Audiol. 8, 71-80] but few studies of multi-talker babble have employed background babble in languages other than the target speech language. To determine whether the adverse effect of background speech is due to the linguistic content or to the acoustic characteristics of the speech masker, this study assessed speech-in-noise recognition when the language of the background noise was either the same or different from the language of the target speech. Replicating previous findings, results showed poorer English sentence recognition by native English listeners in six-talker babble than in two-talker babble, regardless of the language of the babble. In addition, our results showed that in two-talker babble, native English listeners were more adversely affected by English babble than by Mandarin Chinese babble. These findings demonstrate informational masking on sentence-in-noise recognition in the form of "linguistic interference." Whether this interference is at the lexical, sublexical, and/or prosodic levels of linguistic structure and whether it is modulated by the phonetic similarity between the target and noise languages remains to be determined.

Journal ArticleDOI
TL;DR: K-space methods are well suited to modeling high-frequency acoustics applications as they require fewer mesh points per wavelength than conventional finite element and finite difference models, and larger time steps can be taken without a loss of stability or accuracy.
Abstract: Biomedical applications of photoacoustics, in particular photoacoustic tomography, require efficient models of photoacoustic propagation that can incorporate realistic properties of soft tissue, such as acoustic inhomogeneities both for purposes of simulation and for use in model-based image reconstruction methods. k-space methods are well suited to modeling high-frequency acoustics applications as they require fewer mesh points per wavelength than conventional finite element and finite difference models, and larger time steps can be taken without a loss of stability or accuracy. They are also straighforward to encode numerically, making them appealing as a general tool. The rationale behind k-space methods and the k-space approach to the numerical modeling of photoacoustic waves in fluids are covered in this paper. Three existing k-space models are applied to photoacoustics and demonstrated with examples: an exact model for homogeneous media, a second-order model that can take into account heterogeneous media, and a first-order model that can incorporate absorbing boundary conditions.

PatentDOI
TL;DR: In this article, a dielectric substance is laminated on one surface of a piezoelectric material and an IDT and reflectors are disposed as electrodes at a boundary between the piezelectric materials and the dielectrics.
Abstract: A dielectric substance is laminated on one surface of a piezoelectric substance, and an IDT and reflectors are disposed as electrodes at a boundary between the piezoelectric substance and the dielectric substance, and the thickness of the electrodes is determined so that the acoustic velocity of the Stoneley wave is decreased less than that of a slow transverse wave propagating through the dielectric substance and that of a slow transverse wave propagating through the piezoelectric substance, thereby forming a boundary acoustic wave device.

Journal ArticleDOI
TL;DR: The oscillation of these microbubbles in small vessels is directly observed and shown to be substantially different than that predicted by previous models and imaged within large fluid volumes.
Abstract: Many thousands of contrast ultrasound studies have been conducted in clinics around the world. In addition, the microbubbles employed in these examinations are being widely investigated to deliver drugs and genes. Here, for the first time, the oscillation of these microbubbles in small vessels is directly observed and shown to be substantially different than that predicted by previous models and imaged within large fluid volumes. Using pulsed ultrasound with a center frequency of 1 MHz and peak rarefactional pressure of 0.8 or 2.0 MPa , microbubble expansion was significantly reduced when microbubbles were constrained within small vessels in the rat cecum ( p < 0.05 ) . A model for microbubble oscillation within compliant vessels is presented that accurately predicts oscillation and vessel displacement within small vessels. As a result of the decreased oscillation in small vessels, a large resting microbubble diameter resulting from agent fusion or a high mechanical index was required to bring the agent shell into contact with the endothelium. Also, contact with the endothelium was observed during asymmetrical collapse, not during expansion. These results will be used to improve the design of drug delivery techniques using microbubbles.

Journal ArticleDOI
TL;DR: This review paper is intended for researchers from diverse backgrounds, including acousticians, who may not be familiar in detail with soliton theory and includes an outline of the basics ofsoliton theory.
Abstract: Nonlinear internal waves in the ocean are discussed (a) from the standpoint of soliton theory and (b) from the viewpoint of experimental measurements. First, theoretical models for internal solitary waves in the ocean are briefly described. Various nonlinear analytical solutions are treated, commencing with the well-known Boussinesq and Korteweg–de Vries equations. Then certain generalizations are considered, including effects of cubic nonlinearity, Earth’s rotation, cylindrical divergence, dissipation, shear flows, and others. Recent theoretical models for strongly nonlinear internal waves are outlined. Second, examples of experimental evidence for the existence of solitons in the upper ocean are presented; the data include radar and optical images and in situ measurements of wave forms, propagation speeds, and dispersion characteristics. Third, and finally, action of internal solitons on sound wave propagation is discussed. This review paper is intended for researchers from diverse backgrounds, including acousticians, who may not be familiar in detail with soliton theory. Thus, it includes an outline of the basics of soliton theory. At the same time, recent theoretical and observational results are described which can also make this review useful for mainstream oceanographers and theoreticians.

Journal ArticleDOI
TL;DR: Blue and fin whale populations in the Southern Ocean have remained at low numbers for decades since they became protected; using source level and detection range from passive acoustic recordings can help in calculating the relative density of calling whales.
Abstract: Blue (Balaenoptera musculus) and fin whales (B. physalus) produce high-intensity, low-frequency calls, which probably function for communication during mating and feeding. The source levels of blue and fin whale calls off the Western Antarctic Peninsula were calculated using recordings made with calibrated, bottom-moored hydrophones. Blue whales were located up to a range of 200km using hyperbolic localization and time difference of arrival. The distance to fin whales, estimated using multipath arrivals of their calls, was up to 56km. The error in range measurements was 3.8km using hyperbolic localization, and 3.4km using multipath arrivals. Both species produced high-intensity calls; the average blue whale call source level was 189±3dB re:1μPa-1m over 25–29Hz, and the average fin whale call source level was 189±4dB re:1μPa-1m over 15–28Hz. Blue and fin whale populations in the Southern Ocean have remained at low numbers for decades since they became protected; using source level and detection range from ...

Journal ArticleDOI
TL;DR: A survey of a growing body of work in which representations of speech production are used to improve automatic speech recognition is provided.
Abstract: Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

Journal ArticleDOI
TL;DR: A physics-based channel model for the very shallow warm-water acoustic channel at high frequencies is developed, which includes time-varying statistical effects as well as non-Gaussian ambient noise statistics observed during channel studies.
Abstract: Underwater acoustic communication is a core enabling technology with applications in ocean monitoring using remote sensors and autonomous underwater vehicles. One of the more challenging underwater acoustic communication channels is the medium-range very shallow warm-water channel, common in tropical coastal regions. This channel exhibits two key features—extensive time-varying multipath and high levels of non-Gaussian ambient noise due to snapping shrimp—both of which limit the performance of traditional communication techniques. A good understanding of the communications channel is key to the design of communication systems. It aids in the development of signal processing techniques as well as in the testing of the techniques via simulation. In this article, a physics-based channel model for the very shallow warm-water acoustic channel at high frequencies is developed, which are of interest to medium-range communication system developers. The model is based on ray acoustics and includes time-varying statistical effects as well as non-Gaussian ambient noise statistics observed during channel studies. The model is calibrated and its accuracy validated using measurements made at sea.

Journal ArticleDOI
TL;DR: Experimental results show that the three acoustic measures related to the voice source are dependent to varying degrees on age and vowel, while vowel dependencies are more prominent for female talkers suggesting a greater vocal tract-source interaction.
Abstract: The effects of age, sex, and vocal tract configuration on the glottal excitation signal in speech are only partially understood, yet understanding these effects is important for both recognition and synthesis of speech as well as for medical purposes. In this paper, three acoustic measures related to the voice source are analyzed for five vowels from 3145 CVC utterances spoken by 335 talkers (8-39 years old) from the CID database [Miller et al., Proceedings of ICASSP, 1996, Vol. 2, pp. 849-852]. The measures are: the fundamental frequency (F0), the difference between the "corrected" (denoted by an asterisk) first two spectral harmonic magnitudes, H1* - H2* (related to the open quotient), and the difference between the "corrected" magnitudes of the first spectral harmonic and that of the third formant peak, H1* - A3* (related to source spectral tilt). The correction refers to compensating for the influence of formant frequencies on spectral magnitude estimation. Experimental results show that the three acoustic measures are dependent to varying degrees on age and vowel. Age dependencies are more prominent for male talkers, while vowel dependencies are more prominent for female talkers suggesting a greater vocal tract-source interaction. All talkers show a dependency of F0 on sex and on F3, and of H1* - A3* on vowel type. For low-pitched talkers (F0 < or = 175 Hz), H1* - H2* is positively correlated with F0 while for high-pitched talkers, H1* - H2* is dependent on F1 or vowel height. For high-pitched talkers there were no significant sex dependencies of H1* - H2* and H1* - A3*. The statistical significance of these results is shown.

Journal ArticleDOI
TL;DR: The results indicate that a chirp is a more efficient stimulus than a click for the recording of early auditory evoked responses in normal-hearing adults using transient sounds at a high rate of stimulation.
Abstract: This study investigates the use of chirp stimuli to compensate for the cochlear traveling wave delay. The temporal dispersion in the cochlea is given by the traveling time, which in this study is estimated from latency-frequency functions obtained from (1) a cochlear model, (2) tone-burst auditory brain stem response (ABR) latencies, (3) and narrow-band ABR latencies. These latency-frequency functions are assumed to reflect the group delay of a linear system that modifies the phase spectrum of the applied stimulus. On the basis of this assumption, three chirps are constructed and evaluated in 49 normal-hearing subjects. The auditory steady-state responses to these chirps and to a click stimulus are compared at two levels of stimulation (30 and 50dBnHL) and a rate of 90∕s. The chirps give shorter detection time and higher signal-to-noise ratio than the click. The shorter detection time obtained by the chirps is equivalent to an increase in stimulus level of 20dB or more. The results indicate that a chirp is a more efficient stimulus than a click for the recording of early auditory evoked responses in normal-hearing adults using transient sounds at a high rate of stimulation.

Journal ArticleDOI
TL;DR: Singing appears to be a universal human trait, and two of the occasional singers maintained a high rate of pitch errors at the slower tempo, thus suggesting the existence of a purely vocal form of tone deafness.
Abstract: Most believe that the ability to carry a tune is unevenly distributed in the general population. To test this claim, we asked occasional singers ( n = 62 ) to sing a well-known song in both the laboratory and in a natural setting (experiment 1). Sung performances were judged by peers for proficiency, analyzed for pitch and time accuracy with an acoustic-based method, and compared to professional singing. The peer ratings for the proficiency of occasional singers were normally distributed. Only a minority of the occasional singers made numerous pitch errors. The variance in singing proficiency was largely due to tempo differences. Occasional singers tended to sing at a faster tempo and with more pitch and time errors relative to professional singers. In experiment 2 15 nonmusicians from experiment 1 sang the same song at a slow tempo. In this condition, most of the occasional singers sang as accurately as the professional singers. Thus, singing appears to be a universal human trait. However, two of the occasional singers maintained a high rate of pitch errors at the slower tempo. This poor performance was not due to impaired pitch perception, thus suggesting the existence of a purely vocal form of tone deafness.

Journal ArticleDOI
TL;DR: The Los Alamos thermoacoustics code, available at www.lanl.gov/thermoACoustics/, has undergone extensive revision this year, and a Python-based graphical user interface wrapped around that core provides improved usability as discussed by the authors.
Abstract: The Los Alamos thermoacoustics code, available at www.lanl.gov/thermoacoustics/, has undergone extensive revision this year. New calculation features have been added to the original Fortran computational core, and a Python‐based graphical user interface wrapped around that core provides improved usability. A plotter routinely displays thermoacoustic wave properties as a function of x or tracks results when a user‐specified input variable, such as frequency or amplitude, is varied. The Windows‐like user interface provides mouse‐based control, scrolling, and simultaneous displays of plots and of several categories of numerical values, in which color indicates important features. Thermoacoustic phenomena can be calculated with superimposed steady flow, and time‐averaged pressure gradients are calculated. In thermoacoustic systems with toroidal topology, this allows modeling of steady flow caused by gas diodes (with or without time‐averaged heat transfer) and Gedeon streaming. Thermoacoustic mixture separation is included, also with superimposed steady flow. The volume integral of the complex gas momentum is available, so vibrations of thermoacoustic systems can be analyzed.

Journal ArticleDOI
TL;DR: It is demonstrated that for spoken sentences, vowels carry more information about sentence intelligibility than consonants for both young normal-hearing and elderly hearing-impaired listeners.
Abstract: The purpose of this study was to examine the contribution of information provided by vowels versus consonants to sentence intelligibility in young normal-hearing (YNH) and typical elderly hearing-impaired (EHI) listeners. Sentences were presented in three conditions, unaltered or with either the vowels or the consonants replaced with speech shaped noise. Sentences from male and female talkers in the TIMIT database were selected. Baseline performance was established at a 70 dB SPL level using YNH listeners. Subsequently EHI and YNH participants listened at 95 dB SPL. Participants listened to each sentence twice and were asked to repeat the entire sentence after each presentation. Words were scored correct if identified exactly. Average performance for unaltered sentences was greater than 94%. Overall, EHI listeners performed more poorly than YNH listeners. However, vowel-only sentences were always significantly more intelligible than consonant-only sentences, usually by a ratio of 2:1 across groups. In contrast to written English or words spoken in isolation, these results demonstrated that for spoken sentences, vowels carry more information about sentence intelligibility than consonants for both young normal-hearing and elderly hearing-impaired listeners.

Journal ArticleDOI
TL;DR: In this article, the authors explore the history of modern sound and modern culture in early twentieth-century America and explore the culture that created modern sound in the early decades of the 20th century.
Abstract: The American soundscape changed dramatically during the early decades of the twentieth century as new acoustical developments transformed both what people heard and the ways that they listened. What they heard was a new kind of sound that was the product of modern technology. They listened as newly critical consumers of aural commodities. Reverberation equations, sound meters, microphones, and acoustical tiles were deployed in places as varied as Boston’s Symphony Hall, New York’s office skyscrapers, and the sound stages of Hollywood. The result was that the many different spaces that constituted modern America began to sound alike—clear, direct, efficient, and non‐reverberant. While this new modern sound said little about the physical spaces in which it was produced, it has much to tell us about the culture that created it. This talk will explore the history of modern sound and modern culture in early twentieth‐century America.

Journal ArticleDOI
TL;DR: The impairments in speech understanding were generally similar to those found in CI listeners with similar SMTs, suggesting that variability in spread of neural activation largely accounts for the variability in speech perception of CI listeners.
Abstract: Spectral resolution has been reported to be closely related to vowel and consonant recognition in cochlear implant (CI) listeners. One measure of spectral resolution is spectral modulation threshold (SMT), which is defined as the smallest detectable spectral contrast in the spectral ripple stimulus. SMT may be determined by the activation pattern associated with electrical stimulation. In the present study, broad activation patterns were simulated using a multi-band vocoder to determine if similar impairments in speech understanding scores could be produced in normal-hearing listeners. Tokens were first decomposed into 15 logarithmically spaced bands and then re-synthesized by multiplying the envelope of each band by matched filtered noise. Various amounts of current spread were simulated by adjusting the drop-off of the noise spectrum away from the peak (40–5dB∕octave). The average SMT (0.25 and 0.5cycles∕octave) increased from 6.3 to 22.5dB, while average vowel identification scores dropped from 86% to ...

Journal ArticleDOI
TL;DR: The nonlinear behavior of quasi-incompressible soft solids is investigated using the supersonic shear imaging technique based on the remote generation of polarized plane shear waves in tissues induced by the acoustic radiation force.
Abstract: The assessment of viscoelastic properties of soft tissues is enjoying a growing interest in the field of medical imaging as pathologies are often correlated with a local change of stiffness. To date, advanced techniques in that field have been concentrating on the estimation of the second order elastic modulus (mu). In this paper, the nonlinear behavior of quasi-incompressible soft solids is investigated using the supersonic shear imaging technique based on the remote generation of polarized plane shear waves in tissues induced by the acoustic radiation force. Applying a theoretical approach of the strain energy in soft solid [Hamilton et al., J. Acoust. Soc. Am. 116, 41-44 (2004)], it is shown that the well-known acoustoelasticity experiment allowing the recovery of higher order elastic moduli can be greatly simplified. Experimentally, it requires measurements of the local speed of polarized plane shear waves in a statically and uniaxially stressed isotropic medium. These shear wave speed estimates are obtained by imaging the shear wave propagation in soft media with an ultrafast echographic scanner. In this situation, the uniaxial static stress induces anisotropy due to the nonlinear effects and results in a change of shear wave speed. Then the third order elastic modulus (A) is measured in agar-gelatin-based phantoms and polyvinyl alcohol based phantoms.

PatentDOI
TL;DR: In this paper, a system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection, is presented, which includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention.
Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.

PatentDOI
TL;DR: In this article, a stereo implementation of cue coding is presented, where the left and right channels are downmixed to mono and the mono components are then converted back to the time domain to form hybrid stereo signals, which can then be encoded using conventional decoding techniques.
Abstract: Part of the spectrum of two or more input signals is encoded using conventional coding techniques, while encoding the rest of the spectrum using binaural cue coding (BCC). In BCC coding, spectral components of the input signals are downmixed and BCC parameters (e.g., inter-channel level and/or time differences) are generated. In a stereo implementation, after converting the left and right channels to the frequency domain, pairs of left- and right-channel spectral components are downmixed to mono. The mono components are then converted back to the time domain, along with those left- and right-channel spectral components that were not downmixed, to form hybrid stereo signals, which can then be encoded using conventional coding techniques. For playback, the encoded bitstream is decoded using conventional decoding techniques. BCC synthesis techniques may then apply the BCC parameters to synthesize an auditory scene based on the mono components as well as the unmixed stereo components.