scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 2004"


Journal ArticleDOI
TL;DR: This paper describes a shortened and improved version of the Speech in Noise (SIN) Test, which measures the SNR a listener requires to understand 50% of key words in sentences in a background of babble.
Abstract: This paper describes a shortened and improved version of the Speech in Noise (SIN) Test (Etymotic Research, 1993). In the first two of four experiments, the level of a female talker relative to that of four-talker babble was adjusted sentence by sentence to produce 50% correct scores for normal-hearing subjects. In the second two experiments, those sentences-in-babble that produced either lack of equivalence or high across-subject variability in scores were discarded. These experiments produced 12 equivalent lists, each containing six sentences, with one sentence at each adjusted signal-to-noise ratio of 25, 20, 15, 10, 5, and 0 dB. Six additional lists were also made equivalent when the scores of particular pairs were averaged. The final lists comprise the "QuickSIN" test that measures the SNR a listener requires to understand 50% of key words in sentences in a background of babble. The standard deviation of single-list scores is 1.4 dB SNR for hearing-impaired subjects, based on test-retest data. A single QuickSIN list takes approximately one minute to administer and provides an estimate of SNR loss accurate to +/-2.7 dB at the 95% confidence level.

671 citations


Journal ArticleDOI
TL;DR: The findings emphasize the flexibility of human speech processing and require models of spoken word recognition that can rapidly accommodate significant acoustic-phonetic deviations from native language speech patterns.
Abstract: This study explored the perceptual benefits of brief exposure to non-native speech. Native English listeners were exposed to English sentences produced by non-native speakers. Perceptual processing speed was tracked by measuring reaction times to visual probe words following each sentence. Three experiments using Spanish- and Chinese-accented speech indicate that processing speed is initially slower for accented speech than for native speech but that this deficit diminishes within one minute of exposure. Control conditions rule out explanations for the adaptation effect based on practice with the task and general strategies for dealing with difficult speech. Further results suggest that adaptation can occur within as few as two to four sentence-length utterances. The findings emphasize the flexibility of human speech processing and require models of spoken word recognition that can rapidly accommodate significant acoustic-phonetic deviations from native language speech patterns.

486 citations


Journal ArticleDOI
TL;DR: Data suggest that the benefit of binaural hearing for speech intelligibility is especially pronounced when there are multiple voiced interferers at different locations from the target, regardless of spatial configuration; measurements with fewer or with other types of interferers can underestimate this benefit.
Abstract: The “cocktail party problem” was studied using virtual stimuli whose spatial locations were generated using anechoic head-related impulse responses from the AUDIS database [Blauert et al., J. Acoust. Soc. Am. 103, 3082 (1998)]. Speech reception thresholds (SRTs) were measured for Harvard IEEE sentences presented from the front in the presence of one, two, or three interfering sources. Four types of interferer were used: (1) other sentences spoken by the same talker, (2) time-reversed sentences of the same talker, (3) speech-spectrum shaped noise, and (4) speech-spectrum shaped noise, modulated by the temporal envelope of the sentences. Each interferer was matched to the spectrum of the target talker. Interferers were placed in several spatial configurations, either coincident with or separated from the target. Binaural advantage was derived by subtracting SRTs from listening with the “better monaural ear” from those for binaural listening. For a single interferer, there was a binaural advantage of 2–4 dB for all interferer types. For two or three interferers, the advantage was 2–4 dB for noise and speech-modulated noise, and 6–7 dB for speech and time-reversed speech. These data suggest that the benefit of binaural hearing for speech intelligibility is especially pronounced when there are multiple voiced interferers at different locations from the target, regardless of spatial configuration; measurements with fewer or with other types of interferers can underestimate this benefit.

482 citations


PatentDOI
TL;DR: A speech recognition technique is described that has the dual benefits of not requiring collection of recordings for training while using computational resources that are cost-compatible with consumer electronic products.
Abstract: A speech recognition technique is described that has the dual benefits of not requiring collection of recordings for training while using computational resources that are cost-compatible with consumer electronic products. Methods are described for improving the recognition accuracy of a recognizer by developer interaction with a design tool that iterates the recognition data during development of a recognition set of utterances and that allows controlling and minimizing the computational resources required to implement the recognizer in hardware.

430 citations


Journal ArticleDOI
TL;DR: The respondents' attitude to the visual impact of wind turbines on the landscape scenery was found to influence noise annoyance, showing higher proportion of people reporting perception and annoyance than expected from the present dose-response relationships for transportation noise.
Abstract: Installed global wind power increased by 26% during 2003, with U.S and Europe accounting for 90% of the cumulative capacity. Little is known about wind turbines’ impact on people living in their vicinity. The aims of this study were to evaluate the prevalence of annoyance due to wind turbine noise and to study dose–response relationships. Interrelationships between noise annoyance and sound characteristics, as well as the influence of subjective variables such as attitude and noise sensitivity, were also assessed. A cross-sectional study was performed in Sweden in 2000. Responses were obtained through questionnaires (n=351; response rate 68.4%), and doses were calculated as A-weighted sound pressure levels for each respondent. A statistically significant dose–response relationship was found, showing higher proportion of people reporting perception and annoyance than expected from the present dose–response relationships for transportation noise. The unexpected high proportion of annoyance could be due to visual interference, influencing noise annoyance, as well as the presence of intrusive sound characteristics. The respondents’ attitude to the visual impact of wind turbines on the landscape scenery was found to influence noise annoyance.

401 citations


Journal ArticleDOI
TL;DR: A method is presented to solve for shear elasticity and viscosity of a homogeneous medium by measuring shear wave speed dispersion by fitting the theoretical model to solve the complex stiffness of the medium.
Abstract: The propagation speed of shear waves is related to frequency and the complex stiffness (shear elasticity and viscosity) of the medium. A method is presented to solve for shear elasticity and viscosity of a homogeneous medium by measuring shear wave speed dispersion. Harmonic radiation force, introduced by modulating the energy density of incident ultrasound, is used to generate cylindrical shear waves of various frequencies in a homogeneous medium. The speed of shear waves is measured from phase shift detected over the distance propagated. Measurements of shear wave speed at multiple frequencies are fit with the theoretical model to solve for the complex stiffness of the medium. Experiments in gelatin phantoms show promising results validated by an independent method. Practical considerations and challenges in possible medical applications are discussed.

387 citations


Journal ArticleDOI
TL;DR: The results show that normalization procedures that use information across multiple vowels to normalize a single vowel token performed better than those that include only information contained in the vowel token itself ("vowel-intrinsic" information).
Abstract: An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they (a) preserve phonemic information, (b) preserve information about the talker's regional background (or sociolinguistic information), and (c) minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels ("vowel-extrinsic" information) to normalize a single vowel token performed better than those that include only information contained in the vowel token itself ("vowel-intrinsic" information). Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants (e.g., "formant-extrinsic" F2-F1).

379 citations


Journal ArticleDOI
TL;DR: It is shown that quantitative viscosity mapping is still possible if one uses an appropriate inverse problem that fully takes into account diffraction in solids.
Abstract: Two main questions are at the center of this paper. The first one concerns the choice of a rheological model in the frequency range of transient elastography, sonoelasticity or NMR elastography for soft solids (20-1000 Hz). Transient elastography experiments based on plane shear waves that propagate in an Agar-gelatin phantom or in bovine muscles enable one to quantify their viscoelastic properties. The comparison of these experimental results to the prediction of the two simplest rheological models indicate clearly that Voigt's model is the better. The second question studied in the paper deals with the feasibility of quantitative viscosity mapping using inverse problem algorithm. In the ideal situation where plane shear waves propagate in a sample, a simple inverse problem based on the Helmholtz equation correctly retrieves both elasticity and viscosity. In a more realistic situation with nonplane shear waves, this simple approach fails. Nevertheless, it is shown that quantitative viscosity mapping is still possible if one uses an appropriate inverse problem that fully takes into account diffraction in solids.

371 citations


Journal ArticleDOI
TL;DR: These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients and are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.
Abstract: The purpose of this study was to explore the potential advantages, both theoretical and applied, of preserving low-frequency acoustic hearing in cochlear implant patients. Several hypotheses are presented that predict that residual low-frequency acoustic hearing along with electric stimulation for high frequencies will provide an advantage over traditional long-electrode cochlear implants for the recognition of speech in competing backgrounds. A simulation experiment in normal-hearing subjects demonstrated a clear advantage for preserving low-frequency residual acoustic hearing for speech recognition in a background of other talkers, but not in steady noise. Three subjects with an implanted "short-electrode" cochlear implant and preserved low-frequency acoustic hearing were also tested on speech recognition in the same competing backgrounds and compared to a larger group of traditional cochlear implant users. Each of the three short-electrode subjects performed better than any of the traditional long-electrode implant subjects for speech recognition in a background of other talkers, but not in steady noise, in general agreement with the simulation studies. When compared to a subgroup of traditional implant users matched according to speech recognition ability in quiet, the short-electrode patients showed a 9-dB advantage in the multitalker background. These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients. The results are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.

366 citations


PatentDOI
Takaaki Nakamura1
TL;DR: In this paper, the authors present an acoustic vibration analyzing apparatus for carrying out acoustic vibration analysis by picking up data of sounds generated due to rotation of a plurality of gears and data of the number of revolutions of a gear selected from a multiplicity of gears when a transmission of a vehicle having the plurality-of- gears operates.
Abstract: The present invention provides an acoustic vibration analyzing apparatus for carrying out acoustic vibration analysis by picking up data of sounds generated due to rotation of a plurality of gears and data of the number of revolutions of a gear selected from a plurality of gears when a transmission of a vehicle having the plurality of gears operates. The acoustic vibration analyzing apparatus comprises an acoustic vibration calculation portion for analyzing acoustic data in terms of frequency, an order calculation portion for calculating an order in compliance with the specifications of a plurality of gears, a speed calculation portion for calculating the speed of a vehicle, and a display unit for displaying acoustic pressure levels with the order and vehicle speed associated therewith.

336 citations


PatentDOI
TL;DR: In this article, a golf club head formed of multiple materials is disclosed, and the material beyond what is required to maintain structural integrity is removed and replaced with a lightweight material, freeing up mass that can be redistributed to other, more beneficial locations of the club head.
Abstract: A golf club head formed of multiple materials is disclosed. Those portions of the club head that are subject to high stresses during normal use of the golf club head are formed of a metallic material. Most of the material beyond what is required to maintain structural integrity, however, is removed and replaced with a lightweight material. This freed-up mass that can be redistributed to other, more beneficial locations of the club head. The lightweight material also damps vibrations generated during use of the golf club. This vibration damper may be retained in a state of compression to enhance the vibration damping. One or more weight members may be included to obtain desired center of gravity position, moments of inertia, and other club head attributes. A insert formed of multiple materials and having regions of varying thickness may also be included on a rear surface of the club head.

Journal ArticleDOI
TL;DR: The source directions suggested by the selected ITD and ILD cues are shown to imply the results of a number of published psychophysical studies related to source localization in the presence of distracters, as well as in precedence effect conditions.
Abstract: In everyday complex listening situations, sound emanating from several different sources arrives at the ears of a listener both directly from the sources and as reflections from arbitrary directions. For localization of the active sources, the auditory system needs to determine the direction of each source, while ignoring the reflections and superposition effects of concurrently arriving sound. A modeling mechanism with these desired properties is proposed. Interaural time difference (ITD) and interaural level difference (ILD) cues are only considered at time instants when only the direct sound of a single source has non-negligible energy in the critical band and, thus, when the evoked ITD and ILD represent the direction of that source. It is shown how to identify such time instants as a function of the interaural coherence (IC). The source directions suggested by the selected ITD and ILD cues are shown to imply the results of a number of published psychophysical studies related to source localization in the presence of distracters, as well as in precedence effect conditions.

PatentDOI
TL;DR: In this article, sentence-based queries from a user are analyzed using a natural language engine to determine appropriate answers from an electronic database, which is useful for Internet based search engines, as well as distributed speech recognition systems such as a client-server system.
Abstract: Sentence based queries from a user are analyzed using a natural language engine to determine appropriate answers from an electronic database. The system and methods are useful for Internet based search engines, as well as distributed speech recognition systems such as a client-server system. The latter are typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface.

Journal ArticleDOI
TL;DR: Evidence is interpreted for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers that may originate from increased target-masker similarity when spectral resolution is reduced.
Abstract: Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced.

Journal ArticleDOI
TL;DR: In this article, a definition of a local diffuse field applicable to open heterogeneous systems is proposed, which is applicable to both heterogeneous and open systems, and is shown using a reciprocity argument to lead to the familiar identity between the local Green's function of the structure and the diffuse fields correlations.
Abstract: As is now well known, the relation between diffuse field correlations and the Green’s function follows directly from a definition of a diffuse field as an uncorrelated smooth spectral superposition of normal modes. Such a definition is, however, inapplicable in most open structures, the earth in particular. A preferable definition might be that of room acoustics: a diffuse field is an uncorrelated isotropic superposition of plane waves. But that definition is inapplicable to heterogeneous structures, or near boundaries. Here, a definition of a local diffuse field applicable to open heterogeneous systems is proposed. A local diffuse field is taken to be one in steady‐state equilibrium with the field in a homogeneous region having an uncorrelated isotropic superposition of incident plane waves. This definition is applicable to both heterogeneous and open systems, and is shown using a reciprocity argument to lead to the familiar identity between the local Green’s function of the structure and the diffuse fields correlations.

Journal ArticleDOI
TL;DR: A linear integro-differential equation wave model was developed for the anomalous attenuation by using the space-fractional Laplacian operation, and the strategy is then extended to the nonlinear Burgers equation.
Abstract: Frequency-dependent attenuation typically obeys an empirical power law with an exponent ranging from 0 to 2. The standard time-domain partial differential equation models can describe merely two extreme cases of frequency-independent and frequency-squared dependent attenuations. The otherwise nonzero and nonsquare frequency dependency occurring in many cases of practical interest is thus often called the anomalous attenuation. In this study, a linear integro-differential equation wave model was developed for the anomalous attenuation by using the space-fractional Laplacian operation, and the strategy is then extended to the nonlinear Burgers equation. A new definition of the fractional Laplacian is also introduced which naturally includes the boundary conditions and has inherent regularization to ease the hypersingularity in the conventional fractional Laplacian. Under the Szabo's smallness approximation, where attenuation is assumed to be much smaller than the wave number, the linear model is found consistent with arbitrary frequency power-law dependency.

Journal ArticleDOI
TL;DR: The results suggest that informational masking can be overcome by factors that improve listeners' auditory attention toward the target.
Abstract: Three experiments investigated factors that influence the creation of and release from informational masking in speech recognition. The target stimuli were nonsense sentences spoken by a female talker. In experiment 1 the masker was a mixture of three, four, six, or ten female talkers, all reciting similar nonsense sentences. Listeners' recognition performance was measured with both target and masker presented from a front loudspeaker (F-F) or with a masker presented from two loudspeakers, with the right leading the front by 4 ms (F-RF). In the latter condition the target and masker appear to be from different locations. This aids recognition performance for one- and two-talker maskers, but not for noise. As the number of masking talkers increased to ten, the improvement in the F-RF condition diminished, but did not disappear. The second experiment investigated whether hearing a preview (prime) of the target sentence before it was presented in masking improved recognition for the last key word, which was not included in the prime. Marked improvements occurred only for the F-F condition with two-talker masking, not for continuous noise or F-RF two-talker masking. The third experiment found that the benefit of priming in the F-F condition was maintained if the prime sentence was spoken by a different talker or even if it was printed and read silently. These results suggest that informational masking can be overcome by factors that improve listeners' auditory attention toward the target.

Journal ArticleDOI
TL;DR: In this article, a theoretical analysis of plane-wave decomposition given the sound pressure on a sphere is presented, where the amplitudes of the incident plane waves can be calculated as a spherical convolution between the pressure on the sphere and another function which depends on frequency and the sphere radius.
Abstract: Spherical microphone arrays have been recently studied for sound analysis and sound recordings, which have the advantage of spherical symmetry facilitating three-dimensional analysis. This paper complements the recent microphone array design studies by presenting a theoretical analysis of plane-wave decomposition given the sound pressure on a sphere. The analysis uses the spherical Fourier transform and the spherical convolution, where it is shown that the amplitudes of the incident plane waves can be calculated as a spherical convolution between the pressure on the sphere and another function which depends on frequency and the sphere radius. The spatial resolution of plane-wave decomposition given limited bandwidth in the spherical Fourier domain is formulated, and ways to improve the computation efficiency of plane-wave decomposition are introduced. The paper concludes with a simulation example of plane-wave decomposition.

PatentDOI
TL;DR: An ultrasound system has a catheter including an elongate flexible catheter body having at least one lumen extending longitudinally therethrough as mentioned in this paper, which provides a method for reverse irrigation and removal of particles.
Abstract: An ultrasound system has a catheter including an elongate flexible catheter body having at least one lumen extending longitudinally therethrough. The catheter further includes an ultrasound transmission member extending longitudinally through the lumen of the catheter body, the ultrasound transmission member having a proximal end connectable to a separate ultrasound generating device and a distal end coupled to the distal end of the catheter body. The distal end of the catheter body is deflectable. The ultrasound system also includes a sonic connector that connects the ultrasound transmission member to an ultrasound transducer. The ultrasound system also provides a method for reverse irrigation and removal of particles.

Journal ArticleDOI
TL;DR: In this paper, a method to obtain coherent acoustic wave fronts by measuring the space-time correlation function of ocean noise between two hydrophones is experimentally demonstrated, which exhibits deterministic waveguide arrival structure embedded in the time-domain Green's function.
Abstract: A method to obtain coherent acoustic wave fronts by measuring the space–time correlation function of ocean noise between two hydrophones is experimentally demonstrated. Though the sources of ocean noise are uncorrelated, the time-averaged noise correlation function exhibits deterministic waveguide arrival structure embedded in the time-domain Green’s function. A theoretical approach is derived for both volume and surface noise sources. Shipping noise is also investigated and simulated results are presented in deep or shallow water configurations. The data of opportunity used to demonstrate the extraction of wave fronts from ocean noise were taken from the synchronized vertical receive arrays used in the frame of the North Pacific Laboratory (NPAL) during time intervals when no source was transmitting.

Journal ArticleDOI
TL;DR: This article showed that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility, and that these acoustic properties could lead to improved signal processing schemes for hearing aids.
Abstract: Sentences spoken “clearly” are significantly more intelligible than those spoken “conversationally” for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96–103 (1985); Uchanski et al., ibid. 39, 494–509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581–1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434–446 (1986); Uchanski et al., ibid. 39, 494–509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165–2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000–3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.

Journal ArticleDOI
TL;DR: The ultrasonic axial transmission technique, used to assess cortical shells of long bones, is investigated using numerical simulations based on a three-dimensional (3D) finite difference code and it is shown that the cortical depth that contributes to lateral wave SOS measurement is approximately 1-1.5 mm under classical in vivo measurement conditions.
Abstract: The ultrasonic axial transmission technique, used to assess cortical shells of long bones, is investigated using numerical simulations based on a three-dimensional (3D) finite difference code. We focus our interest on the effects of 3D cortical bone geometry (curvature, cortical thickness), anisotropy, and microporosity on speed of sound (SOS) measurements for different frequencies in the MHz range. We first show that SOS values measured on tubular cortical shells are identical to those measured on cortical plates of equal thickness. Anisotropy of cortical bone is then shown to have a major impact on SOS measurement as a function of cortical thickness. The range of SOS values measured on anisotropic bone is half the range found when bone is considered isotropic. Dependence of thickness occurs for cortical shell thinner than 0.5×λbone in anisotropic bone (λbone: wavelength in bone), whereas it occurs for cortical shell thinner than λbone when anisotropy is neglected. Sensitivity of SOS along the bone axis ...

Journal ArticleDOI
TL;DR: A correlation-based algorithm for the automatic measurement of fundamental frequency and open quotient using the derivative of electroglottographic signals is proposed and it is shown that agreement with the glottal-flow measurements is much better than most threshold-based measurements in the case of sustained sounds.
Abstract: Electroglottography is a common method for providing noninvasive measurements of glottal activity. The derivative of the electroglottographic signal, however, has not attracted much attention, although it yields reliable indicators of glottal closing instants. The purpose of this paper is to provide a guide to the usefulness of this signal. The main features that are to be found in this signal are presented on the basis of an extensive analysis of a database of items sung by 18 trained singers. Glottal opening and closing instants are related to peaks in the signal; the latter can be used to measure glottal parameters such as fundamental frequency and open quotient. In some cases, peaks are doubled or imprecise, which points to special (but by no means uncommon) glottal configurations. A correlation-based algorithm for the automatic measurement of fundamental frequency and open quotient using the derivative of electroglottographic signals is proposed. It is compared to three other electroglottographic-based methods with regard to the measurement of open quotient in inverse-filtered derived glottal flow. It is shown that agreement with the glottal-flow measurements is much better than most threshold-based measurements in the case of sustained sounds.

Journal ArticleDOI
TL;DR: In this paper, the authors describe new algorithms, not previously available, for predicting atmospheric absorption of sound at high altitudes, and a basis for estimating atmospheric absorption up to 160 km is described.
Abstract: This paper describes new algorithms, not previously available, for predicting atmospheric absorption of sound at high altitudes. A basis for estimating atmospheric absorption up to 160 km is described. The estimated values at altitudes above 90 km must be considered as only approximate due to uncertainties about the composition of the atmosphere above 90 km and simplifying assumptions. At high altitudes, classical and rotational relaxation absorption are dominant, as opposed to absorption by molecular vibrational relaxation that is the principle atmospheric absorption loss mechanism for primary sonic booms propagating downward from a cruising supersonic aircraft. Classical and rotational relaxation absorption varies inversely with atmospheric pressure, thus increasing in magnitude at high altitudes as atmospheric pressure falls. However, classical and rotational losses also relax at the high values of frequency/pressure reached at high altitudes and thus, for audio and infrasonic frequencies, begin to dec...

Journal ArticleDOI
TL;DR: The speech intelligibility index (SII) concept for estimating intelligibility is extended to include broadband peak-clippers and center-clipping distortion, with the coherence between the input and output signals used to estimate the noise and distortion effects.
Abstract: The speech intelligibility index (SII) (ANSI S3.5-1997) provides a means for estimating speech intelligibility under conditions of additive stationary noise or bandwidth reduction. The SII concept for estimating intelligibility is extended in this paper to include broadband peak-clipping and center-clipping distortion, with the coherence between the input and output signals used to estimate the noise and distortion effects. The speech intelligibility predictions using the new procedure are compared with intelligibility scores obtained from normal-hearing and hearing-impaired subjects for conditions of additive noise and peak-clipping and center-clipping distortion. The most effective procedure divides the speech signal into low-, mid-, and high-level regions, computes the coherence SII separately for the signal segments in each region, and then estimates intelligibility from a weighted combination of the three coherence SII values.

Journal ArticleDOI
TL;DR: It is concluded that the frequently reported disproportionate difficulty of non-native listening under disadvantageous conditions is not due to a disproportionate increase in phoneme misidentifications.
Abstract: Native American English and non-native (Dutch) listeners identified either the consonant or the vowel in all possible American English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios (0, 8, and 16 dB). The phoneme identification performance of the non-native listeners was less accurate than that of the native listeners. All listeners were adversely affected by noise. With these isolated syllables, initial segments were harder to identify than final segments. Crucially, the effects of language background and noise did not interact; the performance asymmetry between the native and non-native groups was not significantly different across signal-to-noise ratios. It is concluded that the frequently reported disproportionate difficulty of non-native listening under disadvantageous conditions is not due to a disproportionate increase in phoneme misidentifications.

Journal ArticleDOI
TL;DR: The EFR techniques investigated here might be developed into a clinically useful objective estimate of temporal auditory acuity for subjects who cannot provide reliable behavioral responses.
Abstract: Temporal auditory acuity, the ability to discriminate rapid changes in the envelope of a sound, is essential for speech comprehension. Human envelope following responses (EFRs) recorded from scalp electrodes were evaluated as an objective measurement of temporal processing in the auditory nervous system. The temporal auditory acuity of older and younger participants was measured behaviorally using both gap and modulation detection tasks. These findings were then related to EFRs evoked by white noise that was amplitude modulated (25% modulation depth) with a sweep of modulation frequencies from 20 to 600 Hz. The frequency at which the EFR was no longer detectable was significantly correlated with behavioral measurements of gap detection (r=−0.43), and with the maximum perceptible modulation frequency (r=0.72). The EFR techniques investigated here might be developed into a clinically useful objective estimate of temporal auditory acuity for subjects who cannot provide reliable behavioral responses.

PatentDOI
Kuansan Wang1
TL;DR: In this paper, a language model consisting of an N-gram language model and a context-free grammar language model is used to store information related to words and semantic information to be recognized.
Abstract: A speech understanding system includes a language model comprising a combination of an N-gram language model and a context-free grammar language model. The language model stores information related to words and semantic information to be recognized. A module is adapted to receive input from a user and capture the input for processing. The module is further adapted to receive SALT application program interfaces pertaining to recognition of the input. The module is configured to process the SALT application program interfaces and the input to ascertain semantic information pertaining to a first portion of the input and output a semantic object comprising text and semantic information for the first portion by accessing the language model, wherein performing recognition and outputting the semantic object are performed while capturing continues for subsequent portions of the input.

Journal ArticleDOI
TL;DR: It was found that noise levels inside classrooms depend upon the activities in which the children are engaged, with a difference of 20 dB L(Aeq) between the "quietest" and "noisiest" activities.
Abstract: Internal and external noise surveys have been carried out around schools in London, UK, to provide information on typical levels and sources to which children are exposed while at school. Noise levels were measured outside 142 schools, in areas away from flight paths into major airports. Here 86% of the schools surveyed were exposed to noise from road traffic, the average external noise level outside a school being 57 dB L(Aeq). Detailed internal noise surveys have been carried out in 140 classrooms in 16 schools, together with classroom observations. It was found that noise levels inside classrooms depend upon the activities in which the children are engaged, with a difference of 20 dB L(Aeq) between the "quietest" and "noisiest" activities. The average background noise level in classrooms exceeds the level recommended in current standards. The number of children in the classroom was found to affect noise levels. External noise influenced internal noise levels only when children were engaged in the quietest classroom activities. The effects of the age of the school buildings and types of window upon internal noise were examined but results were inconclusive.

PatentDOI
Robert D. Strong1
TL;DR: In this article, a plurality of speech rules is generated, each of the rules comprising a language model and an expression associated with the language model, and actions are performed in the system according to the expressions associated with each language model in the set of rules.
Abstract: Assigning meanings to spoken utterances in a speech recognition system. A plurality of speech rules is generated, each of the of speech rules comprising a language model and an expression associated with the language model. At one interval (e.g. upon the detection of speech in the system), a current language model is generated from each language model in the speech rules for use by a recognizer. When a sequence of words is received from the recognizer, a set of speech rules which match the sequence of words received from the recognizer is determined. Each expression associated with the language model in each of the set of speech rules is evaluated, and actions are performed in the system according to the expressions associated with each language model in the set of speech rules.