scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 2017"


Journal ArticleDOI
TL;DR: A separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech and which substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility.
Abstract: Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation.

230 citations


Journal ArticleDOI
TL;DR: Focus requirements for ultrasonic neurostimulation are established through a review of previously employed ultrasonic parameters, and consideration of deep brain targets, and the k-space PSTD scheme performed as well as, or better than, the widely used FDTD scheme across all individual error tests and in the convergence of large scale models, recommending it for use in simulated TR.
Abstract: Non-invasive, focal neurostimulation with ultrasound is a potentially powerful neuroscientific tool that requires effective transcranial focusing of ultrasound to develop. Time-reversal (TR) focusing using numerical simulations of transcranial ultrasound propagation can correct for the effect of the skull, but relies on accurate simulations. Here, focusing requirements for ultrasonic neurostimulation are established through a review of previously employed ultrasonic parameters, and consideration of deep brain targets. The specific limitations of finite-difference time domain (FDTD) and k-space corrected pseudospectral time domain (PSTD) schemes are tested numerically to establish the spatial points per wavelength and temporal points per period needed to achieve the desired accuracy while minimizing the computational burden. These criteria are confirmed through convergence testing of a fully simulated TR protocol using a virtual skull. The k-space PSTD scheme performed as well as, or better than, the widely used FDTD scheme across all individual error tests and in the convergence of large scale models, recommending it for use in simulated TR. Staircasing was shown to be the most serious source of error. Convergence testing indicated that higher sampling is required to achieve fine control of the pressure amplitude at the target than is needed for accurate spatial targeting.

102 citations


Journal ArticleDOI
TL;DR: Findings reveal that broad linguistic categories are reflected in the temporal modulation features of different languages, although this may depend on speaking style.
Abstract: Languages show systematic variation in their sound patterns and grammars. Accordingly, they have been classified into typological categories such as stress-timed vs syllable-timed, or Head-Complement (HC) vs Complement-Head (CH). To date, it has remained incompletely understood how these linguistic properties are reflected in the acoustic characteristics of speech in different languages. In the present study, the amplitude-modulation (AM) and frequency-modulation (FM) spectra of 1797 utterances in ten languages were analyzed. Overall, the spectra were found to be similar in shape across languages. However, significant effects of linguistic factors were observed on the AM spectra. These differences were magnified with a perceptually plausible representation based on the modulation index (a measure of the signal-to-noise ratio at the output of a logarithmic modulation filterbank): the maximum value distinguished between HC and CH languages, with the exception of Turkish, while the exact frequency of this ma...

98 citations


Journal ArticleDOI
TL;DR: The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.
Abstract: Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

97 citations


Journal ArticleDOI
TL;DR: An algorithm is proposed to reconstruct both the SS and AA distributions using a time domain FWI methodology based on the fractional Laplacian wave equation, an adjoint field formulation, and a gradient-descent method.
Abstract: Ultrasound computed tomography (USCT) is a non-invasive imaging technique that provides information about the acoustic properties of soft tissues in the body, such as the speed of sound (SS) and acoustic attenuation (AA). Knowledge of these properties can improve the discrimination between benign and malignant masses, especially in breast cancer studies. Full wave inversion (FWI) methods for image reconstruction in USCT provide the best image quality compared to more approximate methods. Using FWI, the SS is usually recovered in the time domain, and the AA is usually recovered in the frequency domain. Nevertheless, as both properties can be obtained from the same data, it is desirable to have a common framework to reconstruct both distributions. In this work, an algorithm is proposed to reconstruct both the SS and AA distributions using a time domain FWI methodology based on the fractional Laplacian wave equation, an adjoint field formulation, and a gradient-descent method. The optimization code employs a Compute Unified Device Architecture version of the software k-Wave, which provides high computational efficiency. The performance of the method was evaluated using simulated noisy data from numerical breast phantoms. Errors were less than 0.5% in the recovered SS and 10% in the AA.

85 citations


Journal ArticleDOI
TL;DR: This study examines a near-field acoustic holography method consisting of a sparse formulation of the equivalent source method, based on the compressive sensing (CS) framework, and results on a classical guitar and a highly reactive dipole-like source are presented.
Abstract: This study examines a near-field acoustic holography method consisting of a sparse formulation of the equivalent source method, based on the compressive sensing (CS) framework. The method, denoted Compressive-Equivalent Source Method (C-ESM), encourages spatially sparse solutions (based on the superposition of few waves) that are accurate when the acoustic sources are spatially localized. The importance of obtaining a non-redundant representation, i.e., a sensing matrix with low column coherence, and the inherent ill-conditioning of near-field reconstruction problems is addressed. Numerical and experimental results on a classical guitar and on a highly reactive dipole-like source are presented. C-ESM is valid beyond the conventional sampling limits, making wide-band reconstruction possible. Spatially extended sources can also be addressed with C-ESM, although in this case the obtained solution does not recover the spatial extent of the source.

84 citations


Journal ArticleDOI
TL;DR: Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources.
Abstract: Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

75 citations


Journal ArticleDOI
TL;DR: An inhomogeneous acoustic metamaterial lens based on spatial variation of refractive index for broadband focusing of underwater sound is reported, which has potential applications in medical ultrasound imaging and underwater acoustic communications.
Abstract: An inhomogeneous acoustic metamaterial lens based on spatial variation of refractive index for broadband focusing of underwater sound is reported The index gradient follows a modified hyperbolic secant profile designed to reduce aberration and suppress side lobes The gradient index (GRIN) lens is comprised of transversely isotropic hexagonal microstructures with tunable quasi-static bulk modulus and mass density In addition, the unit cells are impedance-matched to water and have in-plane shear modulus negligible compared to the effective bulk modulus The flat GRIN lens is fabricated by cutting hexagonal centimeter scale hollow microstructures in aluminum plates, which are then stacked and sealed from the exterior water Broadband focusing effects are observed within the homogenization regime of the lattice in both finite element simulations and underwater measurements (20–40 kHz) This design approach has potential applications in medical ultrasound imaging and underwater acoustic communications

73 citations


Journal ArticleDOI
Jae Yeon Lee1, Wonju Jeon1
TL;DR: In this paper, a curvilinear shape of an acoustic black hole (ABH) using the simple mathematical geometry of an Archimedean spiral was proposed, which allows a uniform gap distance between adjacent baselines of the spiral.
Abstract: This study starts with a simple question: can the vibration of plates or beams be efficiently reduced using a lightweight structure that occupies a small space? As an efficient technique to damp vibration, the concept of an acoustic black hole (ABH) is adopted with a simple modification of the geometry. The original shape of an ABH is a straight wedge-type profile with power-law thickness, with the reduction of vibration in beams or plates increasing as the length of the ABH increases. However, in real-world applications, there exists an upper bound of the length of an ABH due to space limitations. Therefore, in this study, the authors propose a curvilinear shaped ABH using the simple mathematical geometry of an Archimedean spiral, which allows a uniform gap distance between adjacent baselines of the spiral. In numerical simulations, the damping performance increases as the arc length of the Archimedean spiral increases, regardless of the curvature of the spiral in the mid- and high-frequency ranges. Adding damping material to an ABH can also strongly enhance the damping performance while not significantly increasing the weight. In addition, the radiated sound power of a spiral ABH is similar to that of a standard ABH.

73 citations


Journal ArticleDOI
TL;DR: Yang et al. as discussed by the authors used the causal constraint to delineate what is ultimately possible for sound absorbing structures, and denote those which can attain near-equality for the causal constraints to be "optimal".
Abstract: Causal nature of the acoustic response dictates an inequality that relates the absorption spectrum of the sample to its thickness. We use the causal constraint to delineate what is ultimately possible for sound absorbing structures, and denote those which can attain near-equality for the causal constraint to be “optimal.” By using acoustic metamaterial as backing to conventional porous absorbers, a design strategy is presented for realizing structures with target-set absorption spectra and a sample thickness close to the minimum value as dictated by causality. By using this approach, we have realized a 12 cm-thick structure that exhibits broadband, near-perfect flat absorption spectrum starting at around 400 Hz, while the minimum sample thickness as calculated from the causal constraint is 11.5 cm. To illustrate the versatility of the approach, two additional optimal structures with different target absorption spectra are presented. This “absorption by design” strategy enables the tailoring of customized solutions to difficult room acoustic and noise remediation problems. [Work done in collaboration with Min Yang, Shuyu Chen, and Caixing Fu.]

70 citations


Journal ArticleDOI
TL;DR: Experimental results show that with only three cells, the proposed beam allows considerable vibration energy attenuation within an ultra-broad frequency range including the low frequency range, which conventional PCs can hardly reach.
Abstract: Band gaps in conventional phononic crystals (PCs) are attractive for applications such as vibration control, wave manipulation, and sound absorption. Their practical implementations, however, are hampered by several factors, among which the large number of cells required and their impractically large size to ensure the stopbands at reasonably low frequencies are on the top of the list. This paper reports a type of beam carved inside with two double-leaf acoustic black hole indentations. By incorporating the local resonance effect and the Bragg scattering effect generated by a strengthening stud connecting the two branches of the indentations, ultrawide band gaps are achieved. Increasing the length of the stud or reducing the residual thickness of the indentation allows the tuning of the band gaps to significantly enlarge the band gaps, which can exceed 90% of the entire frequency range of interest. Experimental results show that with only three cells, the proposed beam allows considerable vibration energy attenuation within an ultra-broad frequency range including the low frequency range, which conventional PCs can hardly reach. Meanwhile, the proposed configuration also enhances the structural integrity, thus pointing at promising applications in vibration control and a high performance wave filter design.

Journal ArticleDOI
TL;DR: An approximate analytical model is presented to investigate sound transmission, reflection and absorption of a rubber-like medium comprising a single layer of periodic cylindrical voids attached to a steel backing, modelled as a homogeneous medium with effective material and geometric properties.
Abstract: An approximate analytical model is presented to investigate sound transmission, reflection and absorption of a rubber-like medium comprising a single layer of periodic cylindrical voids attached to a steel backing. The layer of voids is modelled as a homogeneous medium with effective material and geometric properties. A numerical model based on the finite element method is developed to validate results from the homogenization model, as well as to show further insights into the physical mechanisms associated with the system acoustic performance. Monopole resonance of the voids is shown to reduce sound transmission through the voided medium due to increased reflection, resulting in poor sound absorption around this frequency. Peaks of high sound absorption are attributed to Fabry–Perot resonance with the frequency of the first peak derivable by a lumped spring-mass analogy. Sound absorption for a single layer of voids in a soft elastic medium with a steel backing is shown to be similar to the sound absorpti...

Journal ArticleDOI
TL;DR: The acoustic waves being generated during the motion of a bubble in water near a solid boundary are calculated numerically and the sequence of events from bubble growth via axial microjet formation, jet impact, annular nanoJet formation, torus-bubble collapse, and bubble rebound to second collapse is described.
Abstract: The acoustic waves being generated during the motion of a bubble in water near a solid boundary are calculated numerically. The open source package OpenFOAM is used for solving the Navier-Stokes equation and extended to include nonlinear acoustic wave effects via the Tait equation for water. A bubble model with a small amount of gas is chosen, the gas obeying an adiabatic law. A bubble starting from a small size with high internal pressure near a flat, solid boundary is studied. The sequence of events from bubble growth via axial microjet formation, jet impact, annular nanojet formation, torus-bubble collapse, and bubble rebound to second collapse is described. The different pressure and tension waves with their propagation properties are demonstrated.

Journal ArticleDOI
TL;DR: Speech understanding significantly improved with increasing active electrodes up to 22, particularly for subjects with better spectro-temporal resolution, suggesting some listeners may be able to utilize the full electrode array and may not be limited to eight channels of information as indicated in previous studies.
Abstract: This study reconsiders the number of effective channels in contemporary cochlear implants. Subjects listened to matrix sentences with a competing talker using their clinical map (up to 22 electrodes) and reduced-channel maps using 12, 8, and 4 electrodes. Spectro-temporal modulation thresholds and reading span were measured to explore intersubject variability. Results show that speech understanding significantly improved with increasing active electrodes up to 22, particularly for subjects with better spectro-temporal resolution. These findings suggest some listeners may be able to utilize the full electrode array and may not be limited to eight channels of information as indicated in previous studies.

Journal ArticleDOI
TL;DR: Results show that voice quality is quite systematically tied to F0 in Mandarin, and the presence of creak is not exclusively limited to tone 3, but can accompany any of the low pitch targets in the Mandarin tones; further, voice quality overall covaries with pitch height in a wedge-shaped function.
Abstract: This study investigates the interaction between voice quality and pitch by revisiting the well-known case of Mandarin creaky voice. This study first provides several pieces of experimental data to assess whether the mechanism behind allophonic creaky voice in Mandarin is tied to tonal categories or is driven by phonetic pitch ranges. The results show that the presence of creak is not exclusively limited to tone 3, but can accompany any of the low pitch targets in the Mandarin tones; further, tone 3 is less creaky when the overall pitch range is raised, but more creaky when the overall pitch range is lowered. More importantly, tone 3 is not unique in this regard, and other tones such as tone 1 are also subject to similar variations. In sum, voice quality is quite systematically tied to F0 in Mandarin. Results from a pitch glide experiment further suggest that voice quality overall covaries with pitch height in a wedge-shaped function. Non-modal voice tends to occur when pitch production exceeds certain limits. Voice quality, thus, has the potential to enhance the perceptual distinctiveness of extreme pitch targets.

Journal ArticleDOI
TL;DR: Grain size distribution is discussed within the context of previous attenuation models valid for arbitrary crystallite symmetries and is anticipated to play an important role in microstructural characterization research associated with ultrasonic scattering.
Abstract: Elastic wave scattering at grain boundaries in polycrystalline media can be quantified to determine microstructural properties. The amplitude drop observed for coherent wave propagation (attenuation) as well as diffuse-field scattering events have been extensively studied. In all cases, the scattering shows a clear dependence on grain size, grain shape, and microstructural texture. Models used to quantify scattering experiments are often developed assuming dependence on a single spatial length scale, usually, mean grain diameter. However, several microscopy studies suggest that most metals have a log normal distribution of grain sizes. In this study, grain size distribution is discussed within the context of previous attenuation models valid for arbitrary crystallite symmetries. Results are presented for titanium using a range of distribution means and widths assuming equiaxed grains and no preferred crystallographic orientation. The longitudinal and shear attenuations are shown to vary with respect to the frequency dependence for varying distribution widths even when the volumetric mean grain size is held constant. Furthermore, the results suggest that grain size estimates based on attenuation can have large errors if the distribution is neglected. This work is anticipated to play an important role in microstructural characterization research associated with ultrasonic scattering.

Journal ArticleDOI
TL;DR: The results show that distracting background speech largely explains the overall perception of noise, and support the role of room acoustic design, i.e., the simultaneous use of absorption, blocking, and masking in the attainment of good working conditions in open-plan offices.
Abstract: Previous research suggests that, in open-plan offices, noise complaints may be related to the high intelligibility of speech. Distraction distance, which is based on the Speech Transmission Index, can be used to objectively describe the acoustic quality of open-plan offices. However, the relation between distraction distance and perceived noise disturbance has not been established in field studies. The aim of this study was to synthesize evidence from separate studies covering 21 workplaces (N = 883 respondents) and a wide range of room acoustic conditions. The data included both questionnaire surveys and room acoustic measurements [ISO 3382-3 (2012) (International Organization for Standardization, Geneva, Switzerland]. Distraction distance, the spatial decay rate of speech, speech level at 4 m from the speaker, and the average background noise level were examined as possible predictors of perceived noise disturbance. The data were analyzed with individual participant data meta-analysis. The results show that distracting background speech largely explains the overall perception of noise. An increase in distraction distance predicts an increase in disturbance by noise, whereas the other quantities may not alone be associated with noise disturbance. The results support the role of room acoustic design, i.e., the simultaneous use of absorption, blocking, and masking in the attainment of good working conditions in open-plan offices.

Journal ArticleDOI
TL;DR: Results suggested that a smaller number of relatively independent channels provide a better outcome than using all channels that might interact, and spectral resolution, as assessed by spectral-ripple discrimination thresholds, significantly improved after deactivation of five high-threshold sites.
Abstract: The study examined whether the benefit of deactivating stimulation sites estimated to have broad neural excitation was attributed to improved spectral resolution in cochlear implant users. The subjects' spatial neural excitation pattern was estimated by measuring low-rate detection thresholds across the array [see Zhou (2016). PLoS One 11, e0165476]. Spectral resolution, as assessed by spectral-ripple discrimination thresholds, significantly improved after deactivation of five high-threshold sites. The magnitude of improvement in spectral-ripple discrimination thresholds predicted the magnitude of improvement in speech reception thresholds after deactivation. Results suggested that a smaller number of relatively independent channels provide a better outcome than using all channels that might interact.

Journal ArticleDOI
TL;DR: The study explores auditory salience in a set of dynamic natural scenes and indicates that contextual information about the entire scene over both short and long scales needs to be considered in order to properly account for perceptual judgments of salience.
Abstract: Salience describes the phenomenon by which an object stands out from a scene. While its underlying processes are extensively studied in vision, mechanisms of auditory salience remain largely unknown. Previous studies have used well-controlled auditory scenes to shed light on some of the acoustic attributes that drive the salience of sound events. Unfortunately, the use of constrained stimuli in addition to a lack of well-established benchmarks of salience judgments hampers the development of comprehensive theories of sensory-driven auditory attention. The present study explores auditory salience in a set of dynamic natural scenes. A behavioral measure of salience is collected by having human volunteers listen to two concurrent scenes and indicate continuously which one attracts their attention. By using natural scenes, the study takes a data-driven rather than experimenter-driven approach to exploring the parameters of auditory salience. The findings indicate that the space of auditory salience is multidimensional (spanning loudness, pitch, spectral shape, as well as other acoustic attributes), nonlinear and highly context-dependent. Importantly, the results indicate that contextual information about the entire scene over both short and long scales needs to be considered in order to properly account for perceptual judgments of salience.

Journal ArticleDOI
TL;DR: Compressive sensing (CS) implemented using basis pursuit is reformulated as an underdetermined, convex optimization problem, demonstrating it is robust to data-replica mismatch.
Abstract: Matched field processing is a generalized beamforming method that matches received array data to a dictionary of replica vectors in order to locate one or more sources. Its solution set is sparse since there are considerably fewer sources than replicas. Using compressive sensing (CS) implemented using basis pursuit, the matched field problem is reformulated as an underdetermined, convex optimization problem. CS estimates the unknown source amplitudes using the replica dictionary to best explain the data, subject to a row-sparsity constraint. This constraint selects the best matching replicas within the dictionary when using multiple observations and/or frequencies. For a single source, theory and simulations show that the performance of CS and the Bartlett processor are equivalent for any number of snapshots. Contrary to most adaptive processors, CS also can accommodate coherent sources. For a single and multiple incoherent sources, simulations indicate that CS offers modest localization performance improvement over the adaptive white noise constraint processor. SWellEx-96 experiment data results show comparable performance for both processors when localizing a weaker source in the presence of a stronger source. Moreover, CS often displays less ambiguity, demonstrating it is robust to data-replica mismatch.

Journal ArticleDOI
TL;DR: An adaptive procedure for controlling the signal-to-noise ratio (SNR) when rating the subjectively perceived listening effort (Adaptive Categorical Listening Effort Scaling) is described.
Abstract: An adaptive procedure for controlling the signal-to-noise ratio (SNR) when rating the subjectively perceived listening effort (Adaptive Categorical Listening Effort Scaling) is described. For this, the listening effort is rated on a categorical scale with 14 steps after the presentation of three sentences in a background masker. In a first phase of the procedure, the individual SNR range for ratings from "no effort" to "extreme effort" is estimated. In the following phases, stimuli with randomly selected SNRs within this range are presented. One or two linear regression lines are fitted to the data describing subjective listening effort as a function of SNR. The results of the adaptive procedure are independent of the initial SNR. Although a static procedure using fixed, predefined SNRs produced similar results, the adaptive procedure avoided lengthy pretests for suitable SNRs and limited possible bias in the rating procedures. The adaptive procedure resolves individual differences, as well as differences between maskers. Inter-individual standard deviations are about three times as large as intra-individual standard deviations and the intra-class correlation coefficient for test-retest reliability is, on average, 0.9.

Journal ArticleDOI
TL;DR: A method to characterize macroscopically homogeneous rigid frame porous media from impedance tube measurements by deterministic and statistical inversion and finds reliable parameter and uncertainty estimates to the six pore parameters quickly with minimal user input.
Abstract: A method to characterize macroscopically homogeneous rigid frame porous media from impedance tube measurements by deterministic and statistical inversion is presented. Equivalent density and bulk modulus of the samples are reconstructed with the scattering matrix formalism, and are then linked to its physical parameters via the Johnson-Champoux-Allard-Lafarge model. The model includes six parameters, namely the porosity, tortuosity, viscous and characteristic lengths, and static flow and thermal permeabilities. The parameters are estimated from the measurements in two ways. The first one is a deterministic procedure that finds the model parameters by minimizing a cost function in the least squares sense. The second approach is based on statistical inversion. It can be used to assess the validity of the least squares estimate, but also presents several advantages since it provides valuable information on the uncertainty and correlation between the parameters. Five porous samples with a range of pore properties are tested, and the pore parameter estimates given by the proposed inversion processes are compared to those given by other characterization methods. Joint parameter distributions are shown to demonstrate the correlations. Results show that the proposed methods find reliable parameter and uncertainty estimates to the six pore parameters quickly with minimal user input.

Journal ArticleDOI
TL;DR: The emitted pressure level (EPL) represents the OAE level that would be recorded were the ear canal replaced by an infinite tube with no reflections, and provides a powerful way to reduce the variability of OAE measurements and improve their ability to detect cochlear changes.
Abstract: Otoacoustic emissions (OAEs) provide an acoustic fingerprint of the inner ear, and changes in this fingerprint may indicate changes in cochlear function arising from efferent modulation, aging, noise trauma, and/or exposure to harmful agents. However, the reproducibility and diagnostic power of OAE measurements is compromised by the variable acoustics of the ear canal, in particular, by multiple reflections and the emergence of standing waves at relevant frequencies. Even when stimulus levels are controlled using methods that circumvent standing-wave problems (e.g., forward-pressure-level calibration), distortion-product otoacoustic emission (DPOAE) levels vary with probe location by 10–15 dB near half-wave resonant frequencies. The method presented here estimates the initial outgoing OAE pressure wave at the eardrum from measurements of the conventional OAE, allowing one to separate the emitted OAE from the many reflections trapped in the ear canal. The emitted pressure level (EPL) represents the OAE level that would be recorded were the ear canal replaced by an infinite tube with no reflections. When DPOAEs are expressed using EPL, their variation with probe location decreases to the test–retest repeatability of measurements obtained at similar probe positions. EPL provides a powerful way to reduce the variability of OAE measurements and improve their ability to detect cochlear changes.

Journal ArticleDOI
TL;DR: Results showed that head movements can substantially enhance externalization, especially for frontal and rear sources, and that externalization can persist once the subject has stopped moving his/her head.
Abstract: Binaural reproduction aims at recreating a realistic audio scene at the ears of the listener using headphones. In the real acoustic world, sound sources tend to be externalized (that is, perceived to be emanating from a source out in the world) rather than internalized (that is, perceived to be emanating from inside the head). Unfortunately, several studies report a collapse of externalization, especially with frontal and rear virtual sources, when listening to binaural content using non-individualized Head-Related Transfer Functions (HRTFs). The present study examines whether or not head movements coupled with a head tracking device can compensate for this collapse. For each presentation, a speech stimulus was presented over headphones at different azimuths, using several intermixed sets of non-individualized HRTFs for the binaural rendering. The head tracker could either be active or inactive, and the subjects could either be asked to rotate their heads or to keep them as stationary as possible. After each presentation, subjects reported to what extent the stimulus had been externalized. In contrast to several previous studies, results showed that head movements can substantially enhance externalization, especially for frontal and rear sources, and that externalization can persist once the subject has stopped moving his/her head.

Journal ArticleDOI
TL;DR: Results demonstrate that SBL behaves similar to an adaptive processor when localizing a weaker source in the presence of a stronger source, is robust to mismatch, and exhibits improved localization performance when compared to the other processors.
Abstract: The multi-snapshot, multi-frequency sparse Bayesian learning (SBL) processor is derived and its performance compared to the Bartlett, minimum variance distortionless response, and white noise constraint processors for the matched field processing application The two-source model and data scenario of interest includes realistic mismatch implemented in the form of array tilt and data snapshots not exactly corresponding to the range-depth grid of the replica vectors Results demonstrate that SBL behaves similar to an adaptive processor when localizing a weaker source in the presence of a stronger source, is robust to mismatch, and exhibits improved localization performance when compared to the other processors Unlike the basis or matching pursuit methods, SBL automatically determines sparsity and its solution can be interpreted as an ambiguity surface Because of its computational efficiency and performance, SBL is practical for applications requiring adaptive and robust processing

Journal ArticleDOI
TL;DR: The findings of this study strongly suggest that the acoustic nature of vowel nasality is both language- and speaker-specific, and that, like vowel formants, nasality measurements require speaker normalization for across-speaker comparison, andthat these acoustic properties should not be taken as constant across different languages.
Abstract: Although much is known about the linguistic function of vowel nasality, whether contrastive (as in French) or coarticulatory (as in English), and much effort has gone into identifying potential correlates for the phenomenon, this study examines these proposed features to find the optimal acoustic feature(s) for nasality measurement. To this end, a corpus of 4778 oral and nasal vowels in English and French was collected, and data for 22 features were extracted. A series of linear mixed-effects regressions highlighted three promising features with large oral-to-nasal feature differences and strong effects relative to normal oral vowel variability: A1-P0, F1's bandwidth, and spectral tilt. However, these three features, particularly A1-P0, showed considerable variation in baseline and range across speakers and vowels within each language. Moreover, although the features were consistent in direction across both languages, French speakers' productions showed markedly stronger effects, and showed evidence of spectral tilt beyond the nasal norm being used to enhance the oral-nasal contrast. These findings strongly suggest that the acoustic nature of vowel nasality is both language- and speaker-specific, and that, like vowel formants, nasality measurements require speaker normalization for across-speaker comparison, and that these acoustic properties should not be taken as constant across different languages.

Journal ArticleDOI
TL;DR: The design of a membrane acoustic metamaterial absorber is described in which magnetic negative stiffness is employed to reduce the size of the back cavity and it is demonstrated that a small cavity with negative stiffness can achieve the acoustic impedance of a large cavity.
Abstract: A membrane absorber usually requires a large back cavity to achieve low-frequency sound absorption. This paper describes the design of a membrane acoustic metamaterial absorber in which magnetic negative stiffness is employed to reduce the size of the back cavity. As a baseline for the present research, analysis of a typical membrane sound absorber based on an equivalent circuit model is presented first. Then, a theoretical model is established by introducing negative stiffness into a standard absorber. It is demonstrated that a small cavity with negative stiffness can achieve the acoustic impedance of a large cavity and that the absorption peak is shifted to lower frequencies. Experimental results from an impedance tube test are also presented to validate this idea and show that negative stiffness can be employed to design compact low-frequency membrane absorbers.

Journal ArticleDOI
TL;DR: Underwater radiated noise from merchant ships was measured opportunistically from multiple spatial aspects to estimate signature source levels and directionality and previously reported broadband levels at 10° aspect may be ∼12 dB lower than respective surface-affected ANSI/ISO standard derived levels.
Abstract: Underwater radiated noise from merchant ships was measured opportunistically from multiple spatial aspects to estimate signature source levels and directionality Transiting ships were tracked via the Automatic Identification System in a shipping lane while acoustic pressure was measured at the ships' keel and beam aspects Port and starboard beam aspects were 15°, 30°, and 45° in compliance with ship noise measurements standards [ANSI/ASA S1264 (2009) and ISO 17208-1 (2016)] Additional recordings were made at a 10° starboard aspect Source levels were derived with a spherical propagation (surface-affected) or a modified Lloyd's mirror model to account for interference from surface reflections (surface-corrected) Ship source depths were estimated from spectral differences between measurements at different beam aspects Results were exemplified with a 4870 and a 10 036 twenty-foot equivalent unit container ship at 40%–56% and 87% of service speeds, respectively For the larger ship, opportunistic ANSI/I

Journal ArticleDOI
TL;DR: Overall, there is a major contrast between sadness and tenderness on the one hand, and anger, joy, and pride on the other, which can be explained by the high power and arousal characteristics of the emotions with high levels on these components.
Abstract: There has been little research on the acoustic correlates of emotional expression in the singing voice. In this study, two pertinent questions are addressed: How does a singer's emotional interpretation of a musical piece affect acoustic parameters in the sung vocalizations? Are these patterns specific enough to allow statistical discrimination of the intended expressive targets? Eight professional opera singers were asked to sing the musical scale upwards and downwards (using meaningless content) to express different emotions, as if on stage. The studio recordings were acoustically analyzed with a standard set of parameters. The results show robust vocal signatures for the emotions studied. Overall, there is a major contrast between sadness and tenderness on the one hand, and anger, joy, and pride on the other. This is based on low vs high levels on the components of loudness, vocal dynamics, high perturbation variation, and a tendency for high low-frequency energy. This pattern can be explained by the high power and arousal characteristics of the emotions with high levels on these components. A multiple discriminant analysis yields classification accuracy greatly exceeding chance level, confirming the reliability of the acoustic patterns.

Journal ArticleDOI
TL;DR: An approach that combines different rings of microphones together with appropriate radii can mitigate both white noise amplification and deep nulls problems, and Simulation results justify the superiority of the robust CCDMA approach over the traditional CDMAs and robust CDAs.
Abstract: Circular differential microphone arrays (CDMAs) have been extensively studied in speech and audio applications for their steering flexibility, potential to achieve frequency-invariant directivity patterns, and high directivity factors (DFs) However, CDMAs suffer from both white noise amplification and deep nulls in the DF and in the white noise gain (WNG) due to spatial aliasing, which considerably restricts their use in practical systems The minimum-norm filter can improve the WNG by using more microphones than required for a given differential array order; but this filter increases the array aperture (radius), which exacerbates the spatial aliasing problem and worsens the nulls problem in the DF Through theoretical analysis, this research finds that the nulls of the CDMAs are caused by the zeros in the denominators of the filters' coefficients, ie, the zeros of the Bessel function To deal with both the white noise amplification and deep nulls problems, this paper develops an approach that combines