scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 2021"


Journal ArticleDOI
TL;DR: In this paper, a deep learning-based approach is proposed to extend current knowledge of metamaterial design in acoustics by using conditional generative adversarial networks (GANs).
Abstract: Metamaterials are attracting increasing interest in the field of acoustics due to their sound insulation effects. By periodically arranged structures, acoustic metamaterials can influence the way sound propagates in acoustic media. To date, the design of acoustic metamaterials relies primarily on the expertise of specialists since most effects are based on localized solutions and interference. This paper outlines a deep learning-based approach to extend current knowledge of metamaterial design in acoustics. We develop a design method by using conditional generative adversarial networks. The generative network proposes a cell candidate regarding a desired transmission behavior of the metamaterial. To validate our method, numerical simulations with the finite element method are performed. Our study reveals considerable insight into design strategies for sound insulation tasks. By providing design directives for acoustic metamaterials, cell candidates can be inspected and tailored to achieve desirable transmission characteristics.

33 citations


Journal ArticleDOI
TL;DR: This paper examined how signal degradation and loss of visual information due to masks affects intelligibility and memory for native and non-native speech and found that clear speech with a mask significantly improved accuracy in all listening conditions.
Abstract: Though necessary, protective mask wearing in response to the COVID-19 pandemic presents communication challenges. The present study examines how signal degradation and loss of visual information due to masks affects intelligibility and memory for native and non-native speech. We also test whether clear speech can alleviate perceptual difficulty for masked speech. One native and one non-native speaker of English recorded video clips in conversational speech without a mask and conversational and clear speech with a mask. Native English listeners watched video clips presented in quiet or mixed with competing speech. The results showed that word recognition and recall of speech produced with a mask can be as accurate as without a mask in optimal listening conditions. Masks affected non-native speech processing at easier noise levels than native speech. Clear speech with a mask significantly improved accuracy in all listening conditions. Speaking clearly, reducing noise, and using surgical masks as well as good signal amplification can help compensate for the loss of intelligibility due to background noise, lack of visual cues, physical distancing, or non-native speech. The findings have implications for communication in classrooms and hospitals where listeners interact with teachers and healthcare providers, oftentimes non-native speakers, through their protective barriers.

33 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated voice acoustic correlates of COVID-19 infection based on a comprehensive acoustic parameter set and employed the Mann-Whitney U test and calculate effect sizes to identify features with prominent group differences.
Abstract: COVID-19 is a global health crisis that has been affecting our daily lives throughout the past year. The symptomatology of COVID-19 is heterogeneous with a severity continuum. Many symptoms are related to pathological changes in the vocal system, leading to the assumption that COVID-19 may also affect voice production. For the first time, the present study investigates voice acoustic correlates of a COVID-19 infection based on a comprehensive acoustic parameter set. We compare 88 acoustic features extracted from recordings of the vowels /i:/, /e:/, /u:/, /o:/, and /a:/ produced by 11 symptomatic COVID-19 positive and 11 COVID-19 negative German-speaking participants. We employ the Mann-Whitney U test and calculate effect sizes to identify features with prominent group differences. The mean voiced segment length and the number of voiced segments per second yield the most important differences across all vowels indicating discontinuities in the pulmonic airstream during phonation in COVID-19 positive participants. Group differences in front vowels are additionally reflected in fundamental frequency variation and the harmonics-to-noise ratio, group differences in back vowels in statistics of the Mel-frequency cepstral coefficients and the spectral slope. Our findings represent an important proof-of-concept contribution for a potential voice-based identification of individuals infected with COVID-19.

29 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a model where the substrate stiffness increases with depth but makes use of a wave that propagates with little or no dispersion, and consider the possible effects of substrate vibration upon fishes and invertebrates.
Abstract: This paper reviews the nature of substrate vibration within aquatic environments where seismic interface waves may travel along the surface of the substrate, generating high levels of particle motion. There are, however, few data on the ambient levels of particle motion close to the seabed and within the substrates of lakes and rivers. Nor is there information on the levels and the characteristics of the particle motion generated by anthropogenic sources in and on the substrate, which may have major effects upon fishes and invertebrates, all of which primarily detect particle motion. We therefore consider how to monitor substrate vibration and describe the information gained from modeling it. Unlike most acoustic modeling, we treat the substrate as a solid. Furthermore, we use a model where the substrate stiffness increases with depth but makes use of a wave that propagates with little or no dispersion. This shows the presence of higher levels of particle motion than those predicted from the acoustic pressures, and we consider the possible effects of substrate vibration upon fishes and invertebrates. We suggest that research is needed to examine the actual nature of substrate vibration and its effects upon aquatic animals.

27 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated the noise attenuation properties of an acoustic liner consisting of Helmholtz resonators with extended necks (HRENs) for sound absorption in a prescribed frequency range from 700 to 1000 Hz.
Abstract: The noise attenuation properties of an acoustic liner consisting of Helmholtz resonators with extended necks (HRENs) are investigated. An optimal liner constructed by 16 inhomogeneous HRENs is designed to be effective in sound absorption in a prescribed frequency range from 700 to 1000 Hz. Its quasi-perfect absorption capability (average absorption coefficient above 0.9) is validated by measurements and simulations. The resonance frequencies of the individual resonators in the designed liner are just located within the effective absorption bandwidth, indicating the overlapping phenomenon of absorption peaks. In addition, the liner maintains a thin thickness, about 1/25th with respect to the longest operating wavelengths. To assess the acoustic performance of the designed liner in the presence of mean flow, experimental investigations are performed in a flow tube. Results show a near flat transmission loss is attained in the target frequency range by the designed liner. Additionally, the impedance of the uniform HREN-based liner is extracted at flow condition. In all, the inhomogeneous HREN-based liner is featured by the thin thickness and the excellent wide-band noise attenuation property. These features make the designed liner an promising solution for noise attenuation in both static and flow conditions.

27 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a standardized imaging protocol and scoring system for lung ultrasound (LUS) data and developed the first deep learning (DL) algorithms capable of evaluating LUS videos providing, for each video-frame, the score as well as semantic segmentation.
Abstract: In the current pandemic, lung ultrasound (LUS) played a useful role in evaluating patients affected by COVID-19. However, LUS remains limited to the visual inspection of ultrasound data, thus negatively affecting the reliability and reproducibility of the findings. Moreover, many different imaging protocols have been proposed, most of which lacked proper clinical validation. To address these problems, we were the first to propose a standardized imaging protocol and scoring system. Next, we developed the first deep learning (DL) algorithms capable of evaluating LUS videos providing, for each video-frame, the score as well as semantic segmentation. Moreover, we have analyzed the impact of different imaging protocols and demonstrated the prognostic value of our approach. In this work, we report on the level of agreement between the DL and LUS experts, when evaluating LUS data. The results show a percentage of agreement between DL and LUS experts of 85.96% in the stratification between patients at high risk of clinical worsening and patients at low risk. These encouraging results demonstrate the potential of DL models for the automatic scoring of LUS data, when applied to high quality data acquired accordingly to a standardized imaging protocol.

26 citations


Journal ArticleDOI
TL;DR: An extensive and comprehensive overview of the techniques developed for B/A measurement of liquid and liquid-like media, identifying the methods that are most promising from a clinical perspective and suggesting directions that may lead to further improvement.
Abstract: The nonlinear parameter of ultrasound B/A has shown to be a useful diagnostic parameter, reflecting medium content, structure, and temperature. Despite its recognized values, B/A is not yet used as a diagnostic tool in the clinic due to the limitations of current measurement and imaging techniques. This review presents an extensive and comprehensive overview of the techniques developed for B/A measurement of liquid and liquid-like media (e.g., tissue), identifying the methods that are most promising from a clinical perspective. This work summarizes the progress made in the field and the typical challenges on the way to B/A estimation. Limitations and problems with the current techniques are identified, suggesting directions that may lead to further improvement. Since the basic theory of the physics behind the measurement strategies is presented, it is also suited for a reader who is new to nonlinear ultrasound.

26 citations


Journal ArticleDOI
TL;DR: The results corroborate earlier published results for non-Transparent masks, but transparent options have greater attenuation, resonant peaks, and deflect sounds in ways that non-transparent masks do not.
Abstract: The widespread use of face coverings during the COVID-19 pandemic has created communication challenges for many individuals, particularly for those who are deaf or hard of hearing and for those who must speak through masks in suboptimal conditions. This study includes some newer mask options as well as transparent masks to help those who depend on lipreading and other facial cues. The results corroborate earlier published results for non-transparent masks, but transparent options have greater attenuation, resonant peaks, and deflect sounds in ways that non-transparent masks do not. Although transparent face coverings have poorer acoustic performance, the presence of visual cues remains important for both verbal and non-verbal communication. Fortunately, there are creative solutions and technologies available to overcome audio and/or visual barriers caused by face coverings.

25 citations


Journal ArticleDOI
Abstract: The effect of face covering masks on listeners' recall of spoken sentences was investigated. Thirty-two German native listeners watched video recordings of a native speaker producing German sentences with and without a face mask, and then completed a cued-recall task. Listeners recalled significantly fewer words when the sentences had been spoken with a face mask. This might suggest that face masks increase processing demands, which in turn leaves fewer resources for encoding speech in memory. The result is also informative for policy-makers during the COVID-19 pandemic, regarding the impact of face masks on oral communication.

25 citations


Journal ArticleDOI
TL;DR: This study evaluated the suitability of video calls for the phonetic analysis of vowel configurations, mergers, and nasalization by comparing simultaneous recordings from three popular video conferencing apps to those taken from professional equipment and an offline iPad identical to those running the apps.
Abstract: When the COVID-19 pandemic halted in-person data collection, many linguists adopted new online technologies to replace traditional methods, including video conferencing applications (apps) like Zoom (Zoom Video Communications, San Jose, CA), which allow live interaction with remote participants. This study evaluated the suitability of video calls for the phonetic analysis of vowel configurations, mergers, and nasalization by comparing simultaneous recordings from three popular video conferencing apps (Zoom; Microsoft Skype, Redmond, WA; Microsoft Teams, Redmond, WA) to those taken from professional equipment (H4n field recorder) and an offline iPad (Apple, Cupertino, CA) identical to those running the apps. All three apps conveyed vowel arrangements and nasalization patterns relatively faithfully, but absolute measurements varied, particularly for the female speaker and in the 750–1500 Hz range, which affected the locations (F1 × F2) of low and back vowels and reduced nasalization measurements (A1-P0) for the female's prenasal vowels. Based on these results, we assess the validity of remote recording using these apps and offer recommendations for the best practices for collecting high fidelity acoustic phonetic data from a distance.

24 citations


Journal ArticleDOI
TL;DR: In this paper, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with an external microphone (henceforth, H6) and compared with two alternatives accessible to potential participants at home.
Abstract: Face-to-face speech data collection has been next to impossible globally as a result of the COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with an external microphone (henceforth, H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth, Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth, Phone). F0 was tracked accurately by all of the devices; however, for formant analysis (F1, F2, F3), Phone performed better than Zoom, i.e., more similarly to H6, although the data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.

Journal ArticleDOI
TL;DR: In this article, a deep transfer learning (DTL) method is proposed for the direction of arrival (DOA) estimation using a single-vector sensor, which involves training a convolutional neural network (CNN) with synthetic data in source domain and then adapting the source domain to target domain with available at-sea data.
Abstract: A deep transfer learning (DTL) method is proposed for the direction of arrival (DOA) estimation using a single-vector sensor. The method involves training of a convolutional neural network (CNN) with synthetic data in source domain and then adapting the source domain to target domain with available at-sea data. The CNN is fed with the cross-spectrum of acoustical pressure and particle velocity during the training process to learn DOAs of a moving surface ship. For domain adaptation, first convolutional layers of the pre-trained CNN are copied to a target CNN, and the remaining layers of the target CNN are randomly initialized and trained on at-sea data. Numerical tests and real data results suggest that the DTL yields more reliable DOA estimates than a conventional CNN, especially with interfering sources.

Journal ArticleDOI
TL;DR: In this article, a new metamaterial has been developed with the use of a polyvinyl chloride membrane on which buttons have been glued, and two types of buttons were used, with different weights, placing them on the membrane according to a radial geometry.
Abstract: Metamaterials are designed by arranging artificial structural elements according to periodic geometries to obtain advantageous and unusual properties when they are hit by waves. Initially designed to interact with electromagnetic waves, their use naturally extended to sound waves, proving to be particularly useful for the construction of containment and soundproofing systems in buildings. In this work, a new metamaterial has been developed with the use of a polyvinyl chloride membrane on which buttons have been glued. Two types of buttons were used, with different weights, placing them on the membrane according to a radial geometry. Each sample of metamaterial was subjected to sound absorption coefficient measurements using the impedance tube. Measurements were made using the samples by setting three configurations, creating a cavity with different thicknesses. The results of the measurements were subsequently used as input for training a simulation model based on artificial neural networks. The model showed an excellent generalization capacity, returning estimates of the acoustic absorption coefficient of the metamaterial very similar to the measured value. Subsequently, the model was used to perform a sensitivity analysis to evaluate the contribution of the various input variables on the returned output.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the optimal rotor spacing distance configuration to minimize the noise annoyance in a UAV with a series of psychoacoustic metrics (i.e., loudness, fluctuation strength, roughness, sharpness, and tonality).
Abstract: Unmanned aerial vehicle (UAV) technologies are rapidly advancing due to the unlimited number of applications from parcel delivery to people transportation As the UAV market expands, community noise impact will become a significant problem for public acceptance Compact drone architectures based on contra-rotating propellers bring significant benefits in terms of aerodynamic performance and redundancy to ensure vehicle control in case of component failure However, contra-rotating propellers are severely noisy if not designed appropriately In the framework of a perception-influenced design approach, this paper investigates the optimal rotor spacing distance configuration to minimise noise annoyance On the basis of a series of psychoacoustic metrics (ie, loudness, fluctuation strength, roughness, sharpness, and tonality) and psychoacoustic annoyance (PA) models, the optimal rotor axial separation distance (expressed as a function of propeller blade diameter) is at a range from 02 to 04 This paper also discusses the performance of currently available psychoacoustic models to predict propeller noise annoyance and defines further work to develop a PA model optimised for rotating systems

Journal ArticleDOI
TL;DR: In this article, an average accuracy of 80% was obtained estimating COVID-19 positive or negative, derived from multiple cough and vowel /a/ recordings, and an average accuracies of 83% by evaluating six symptomatic questions.
Abstract: The COVID-19 outbreak was announced as a global pandemic by the World Health Organization in March 2020 and has affected a growing number of people in the past few months. In this context, advanced artificial intelligence techniques are brought to the forefront as a response to the ongoing fight toward reducing the impact of this global health crisis. In this study, potential use-cases of intelligent speech analysis for COVID-19 identification are being developed. By analyzing speech recordings from COVID-19 positive and negative patients, we constructed audio- and symptomatic-based models to automatically categorize the health state of patients, whether they are COVID-19 positive or not. For this purpose, many acoustic features were established, and various machine learning algorithms are being utilized. Experiments show that an average accuracy of 80% was obtained estimating COVID-19 positive or negative, derived from multiple cough and vowel /a/ recordings, and an average accuracy of 83% was obtained estimating COVID-19 positive or negative patients by evaluating six symptomatic questions. We hope that this study can foster an extremely fast, low-cost, and convenient way to automatically detect the COVID-19 disease.

Journal ArticleDOI
TL;DR: In this paper, a finite-element model is presented for numerical simulation in three dimensions of suspended microparticles in a microchannel embedded in a polymer chip and driven by an attached piezoelectric transducer at MHz frequencies.
Abstract: A finite-element model is presented for numerical simulation in three dimensions of acoustophoresis of suspended microparticles in a microchannel embedded in a polymer chip and driven by an attached piezoelectric transducer at MHz frequencies. In accordance with the recently introduced principle of whole-system ultrasound resonances, an optimal resonance mode is identified that is related to an acoustic resonance of the combined transducer-chip-channel system and not to the conventional pressure half-wave resonance of the microchannel. The acoustophoretic action in the microchannel is of comparable quality and strength to conventional silicon-glass or pure glass devices. The numerical predictions are validated by acoustic focusing experiments on 5-μm-diameter polystyrene particles suspended inside a microchannel, which was milled into a polymethylmethacrylate chip. The system was driven anti-symmetrically by a piezoelectric transducer, driven by a 30-V peak-to-peak alternating voltage in the range from 0.5 to 2.5 MHz, leading to acoustic energy densities of 13 J/m3 and particle focusing times of 6.6 s.

Journal ArticleDOI
TL;DR: This study examines the use of Gaussian process (GP) regression for sound field reconstruction, and a hierarchical Bayesian parameterization is introduced, which enables the construction of a plane wave kernel of variable sparsity.
Abstract: This study examines the use of Gaussian process (GP) regression for sound field reconstruction. GPs enable the reconstruction of a sound field from a limited set of observations based on the use of a covariance function (a kernel) that models the spatial correlation between points in the sound field. Significantly, the approach makes it possible to quantify the uncertainty on the reconstruction in a closed form. In this study, the relation between reconstruction based on GPs and classical reconstruction methods based on linear regression is examined from an acoustical perspective. Several kernels are analyzed for their potential in sound field reconstruction, and a hierarchical Bayesian parameterization is introduced, which enables the construction of a plane wave kernel of variable sparsity. The performance of the kernels is numerically studied and compared to classical reconstruction methods based on linear regression. The results demonstrate the benefits of using GPs in sound field analysis. The hierarchical parameterization shows the overall best performance, adequately reconstructing fundamentally different sound fields. The approach appears to be particularly powerful when prior knowledge of the sound field would not be available.

Journal ArticleDOI
TL;DR: In this article, the equivalence between the expressions obtained with these two methods for both the force and torque was established, and Gong, Marston, and Li [Phys. Rev. Appl.
Abstract: Two main methods have been proposed to derive the acoustical radiation force and torque applied by an arbitrary acoustic field on a particle: The first one relies on the plane wave angular spectrum decomposition of the incident field (see Sapozhnikov and Bailey [J. Acoust. Soc. Am. 133, 661–676 (2013)] for the force and Gong and Baudoin [J. Acoust. Soc. Am. 148, 3131–3140 (2020)] for the torque), while the second one relies on the decomposition of the incident field into a sum of spherical waves, the so-called multipole expansion (see Silva [J. Acoust. Soc. Am. 130, 3541–3544 (2011)] and Baresch, Thomas, and Marchiano [J. Acoust. Soc. Am. 133, 25–36 (2013)] for the force, and Silva, Lobo, and Mitri [Europhys. Lett. 97, 54003 (2012)] and Gong, Marston, and Li [Phys. Rev. Appl. 11, 064022 (2019)] for the torque). In this paper, we formally establish the equivalence between the expressions obtained with these two methods for both the force and torque.

Journal ArticleDOI
TL;DR: In this paper, two sequential sparse Bayesian learning (SBL) based methods are proposed to propagate statistical information across time to improve the performance of the estimation of the time-varying directions of arrival (DOAs) of signals emitted by moving sources.
Abstract: This paper presents methods for the estimation of the time-varying directions of arrival (DOAs) of signals emitted by moving sources. Following the sparse Bayesian learning (SBL) framework, prior information of unknown source amplitudes is modeled as a multi-variate Gaussian distribution with zero-mean and time-varying variance parameters. For sequential estimation of the unknown variance, we present two sequential SBL-based methods that propagate statistical information across time to improve DOA estimation performance. The first method heuristically calculates the parameters of an inverse-gamma hyperprior based on the source signal estimate from the previous time step. In addition, a second sequential SBL method is proposed, which performs a prediction step to calculate the prior distribution of the current variance parameter from the variance parameter estimated at the previous time step. The SBL-based sequential processing provides high-resolution DOA tracking capabilities. Performance improvements are demonstrated by using simulated data as well as real data from the SWellEx-96 experiment.

Journal ArticleDOI
TL;DR: In this paper, a theoretical framework for acoustic wave propagation in a metasurface comprising a hexagonal lattice of hard spherical inclusions embedded in a soft elastic medium is presented.
Abstract: We present a theoretical framework for acoustic wave propagation in a metasurface comprising a hexagonal lattice of hard spherical inclusions embedded in a soft elastic medium. Each layer of inclusions in the direction of sound propagation is approximated as a homogenized layer with effective geometric and material properties. To account for multiple scattering effects in the lattice of resonant inclusions, an analogy between the fluid dynamics of creeping flows and elastodynamics of soft materials is implemented. Results obtained analytically are in excellent agreement with numerical simulations that exactly model the geometric and material properties of the metasurface.

Journal ArticleDOI
TL;DR: In this paper, wave steepening and shock coalescence due to nonlinear propagation effects for a cold Mach 3 jet was investigated for specific frequencies by considering the spatial distribution of the Morfey-Howell indicator in the near and far acoustic fields.
Abstract: Wave steepening and shock coalescence due to nonlinear propagation effects are investigated for a cold Mach 3 jet. The jet flow and near pressure fields are computed using large-eddy simulation. The near acoustic field is propagated to the far field by solving the linearized or the weakly nonlinear Euler equations. Near the angle of peak levels, the skewness factors of the pressure fluctuations for linear and nonlinear propagations display positive values that are almost identical. Thus, the positive asymmetry of the fluctuations originates during the wave generation process and is not due to nonlinear propagation effects. Compressions in the signals are much steeper for a nonlinear than for a linear propagation, highlighting the crucial role of nonlinear distortions in the formation of steepened waves. The power transfers due to nonlinear propagation are examined for specific frequencies by considering the spatial distribution of the Morfey–Howell indicator in the near and far acoustic fields. They are in good agreement with the direct measurements performed by comparing the spectra for nonlinear and linear propagations. This shows the suitability of the Morfey–Howell indicator to characterize nonlinear distortions for supersonic jets.

Journal ArticleDOI
TL;DR: In this paper, the effect of using fluid inserts for noise control at high exhaust temperatures was investigated by performing a sequence of large eddy simulations on a typical military-style nozzle, both with and without fluid inserts.
Abstract: The goal of the present investigation is to study the effect of using fluid inserts for noise control at high exhaust temperatures by performing a sequence of large eddy simulations on a typical military-style nozzle, both with and without fluid inserts, at jet inlet total temperature ratios of 2.5, 5, and 7. An exact physics-based splitting of the jet flow-field into its hydrodynamic, acoustic, and thermal components reveals clear evidence of a reduction in the radiation efficiency of Mach waves from the controlled jet. This effect is far more pronounced at afterburner conditions, where the location of the maximum noise reduction is observed to shift upstream with increase in jet temperature, thus matching the maximum location of the jet OASPL directivity. Moreover, the maximum noise reduction achieved at afterburner conditions exceeds that obtained at lower exhaust temperatures. This is encouraging and shows that the effectiveness of the fluid inserts improves with an increase in jet exhaust temperature. Furthermore, by accounting for the effect of bleeding off bypass air for the fluid inserts in the LES simulation, this noise reduction is predicted to be achieved at a conservative thrust loss estimate of under 2% at both laboratory and afterburner operating conditions.

Journal ArticleDOI
TL;DR: In this article, the authors explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilizing proxies of perceptual similarity judgements, and demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioral judgements.
Abstract: Evaluating sound similarity is a fundamental building block in acoustic perception and computational analysis. Traditional data-driven analyses of perceptual similarity are based on heuristics or simplified linear models, and are thus limited. Deep learning embeddings, often using triplet networks, have been useful in many fields. However, such networks are usually trained using large class-labelled datasets. Such labels are not always feasible to acquire. We explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilising proxies of perceptual similarity judgements. Ultimately, our target is to create a perceptual embedding space that reflects animals' perception of sound. We create deep perceptual embeddings for bird sounds using triplet models. In order to deal with the challenging nature of triplet loss training with the lack of class-labelled data, we utilise multidimensional scaling (MDS) pretraining, attention pooling, and a triplet mining scheme. We also evaluate the advantage of triplet learning compared to learning a neural embedding from a model trained on MDS alone. Using computational proxies of similarity judgements, we demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioural judgements, helping us understand how animals perceive sounds.

Journal ArticleDOI
TL;DR: In this paper, a semi-analytical method of suppressing acoustic scattering using reinforcement learning (RL) algorithms is presented, where a RL agent can control over design parameters of a planar configuration of cylindrical scatterers in water.
Abstract: This paper presents a semi-analytical method of suppressing acoustic scattering using reinforcement learning (RL) algorithms. We give a RL agent control over design parameters of a planar configuration of cylindrical scatterers in water. These design parameters control the position and radius of the scatterers. As these cylinders encounter an incident acoustic wave, the scattering pattern is described by a function called total scattering cross section (TSCS). Through evaluating the gradients of TSCS and other information about the state of the configuration, the RL agent perturbatively adjusts design parameters, considering multiple scattering between the scatterers. As each adjustment is made, the RL agent receives a reward negatively proportional to the root mean square of the TSCS across a range of wavenumbers. Through maximizing its reward per episode, the agent discovers designs with low scattering. Specifically, the double deep Q-learning network and the deep deterministic policy gradient algorithms are employed in our models. Designs discovered by the RL algorithms performed well when compared to a state-of-the-art optimization algorithm using fmincon.

Journal ArticleDOI
TL;DR: In this paper, the authors use an LES database of round, isothermal, Mach 0.9 and 1.5 jets to produce an ensemble of realizations for the acoustic field that they project onto a limited set of resolvent modes.
Abstract: Resolvent analysis has demonstrated encouraging results for modeling coherent structures in jets when compared against their data-educed counterparts from high-fidelity large-eddy simulations (LES). We formulate resolvent analysis as an acoustic analogy that relates the near-field resolvent forcing to the near- and far-field pressure. We use an LES database of round, isothermal, Mach 0.9 and 1.5 jets to produce an ensemble of realizations for the acoustic field that we project onto a limited set of resolvent modes. In the near-field, we perform projections on a restricted acoustic output domain, r/D=[5,6], while the far-field projections are performed on a Kirchhoff surface comprising a 100-diameter arc centered at the nozzle. This allows the LES realizations to be expressed in the resolvent basis via a data-deduced, low-rank, cross-spectral density matrix. We find that a single resolvent mode reconstructs the most energetic regions of the acoustic field across Strouhal numbers, St=[0−1], and azimuthal wavenumbers, m=[0,2]. Finally, we present a simple function that results in a rank-1 resolvent model agreeing within 2 dB of the peak noise for both jets.

Journal ArticleDOI
TL;DR: In this article, the audio sound field generated by a parametric array loudspeaker (PAL) is divided into three regions: near field, Westervelt far field, and inverse-law far field.
Abstract: The near and far fields of traditional loudspeakers are differentiated by whether the sound pressure amplitude is inversely proportional to the propagating distance. However, the audio sound field generated by a parametric array loudspeaker (PAL) is more complicated, and in this article it is proposed to be divided into three regions: near field, Westervelt far field, and inverse-law far field. In the near field, the audio sound experiences strong local effects and an efficient quasilinear solution is presented. In the Westervelt far field, local effects are negligible so that the Westervelt equation is used, and in the inverse-law far field, a simpler solution is adopted. It is found that the boundary between the near and Westervelt far fields for audio sound lies at approximately a2/λ – λ/4, where a is transducer radius and λ is ultrasonic wavelength. At large transducer radii and high ultrasonic frequencies, the boundary moves close to the PAL and can be estimated by a closed-form formula. The inverse-law holds for audio sound in the inverse-law far field and is more than 10 meters away from the PAL in most cases. With the proposed classification, it is convenient to apply appropriate prediction models to different regions.

Journal ArticleDOI
TL;DR: In this paper, the authors explored the perceptual impact of two simplifications of Ambisonics-based binaural reverberation that aim to improve efficiency by reducing the spatial resolution of a reverberant sound field.
Abstract: Reverberation is essential for the realistic auralisation of enclosed spaces However, it can be computationally expensive to render with high fidelity and, in practice, simplified models are typically used to lower costs while preserving perceived quality Ambisonics-based methods may be employed to this purpose as they allow us to render a reverberant sound field more efficiently by limiting its spatial resolution The present study explores the perceptual impact of two simplifications of Ambisonics-based binaural reverberation that aim to improve efficiency First, a “hybrid Ambisonics” approach is proposed in which the direct sound path is generated by convolution with a spatially dense head related impulse response set, separately from reverberation Second, the reverberant virtual loudspeaker method (RVL) is presented as a computationally efficient approach to dynamically render binaural reverberation for multiple sources with the potential limitation of inaccurately simulating listener's head rotations Numerical and perceptual evaluations suggest that the perceived quality of hybrid Ambisonics auralisations of two measured rooms ceased to improve beyond the third order, which is a lower threshold than what was found by previous studies in which the direct sound path was not processed separately Additionally, RVL is shown to produce auralisations with comparable perceived quality to Ambisonics renderings

Journal ArticleDOI
TL;DR: This article investigated the effect of musical and pitch aptitude on the level-tone learning variability in Mandarin-speaking Mandarin speakers with experience of a contour-tone system (Cantonese).
Abstract: Contrary to studies on speech learning of consonants and vowels, the issue of individual variability is less well understood in the learning of lexical tones. Whereas existing studies have focused on contour-tone learning (Mandarin) by listeners without experience of a tonal language, this study addressed a research gap by investigating the perceptual learning of level-tone contrasts (Cantonese) by learners with experience of a contour-tone system (Mandarin). Critically, we sought to answer the question of how Mandarin listeners' initial perception and learning of Cantonese level-tones are affected by their musical and pitch aptitude. Mandarin-speaking participants completed a pretest, training, and a posttest in the level-tone discrimination and identification (ID) tasks. They were assessed in musical aptitude and speech and nonspeech pitch thresholds before training. The results revealed a significant training effect in the ID task but not in the discrimination task. Importantly, the regression analyses showed an advantage of higher musical and pitch aptitude in perceiving Cantonese level-tone categories. The results explained part of the level-tone learning variability in speakers of a contour-tone system. The finding implies that prior experience of a tonal language does not necessarily override the advantage of listeners' musical and pitch aptitude.

Journal ArticleDOI
TL;DR: In this article, the authors present an analysis of the noise levels in Girona, a 100 000 citizen city in the North-East of Catalonia (Spain), including all the stages of the lockdown.
Abstract: The lockdown measures in Spain due to COVID-19 social measures showed a wide decrease in the urban noise levels observed. This paper presents an analysis of the noise levels in Girona, a 100 000 citizen city in the North-East of Catalonia (Spain). We present the LAeq levels in four different locations from January 2020 to June 2020, including all the stages of the lockdown. Several comparisons are conducted with the monitoring data available from the previous years (2019, 2018, and 2017, when available). This analysis is part of the project "Sons al Balco," which aims to draw the soundscape of Catalonia during the lockdown. The results of the analysis in Girona show drastic LAeq changes especially in nightlife areas of the city, moderate LAeq changes in commercial and restaurants areas, and low LAeq changes in dense traffic areas.

Journal ArticleDOI
TL;DR: In this article, the authors describe validation of a psychoacoustic model designed to quantify the integral sound of a target voice sample, which includes parameters to characterize the harmonic and inharmonic voice sources, vocal tract transfer function, fundamental frequency, and amplitude of the voice.
Abstract: No agreed-upon method currently exists for objective measurement of perceived voice quality. This paper describes validation of a psychoacoustic model designed to fill this gap. This model includes parameters to characterize the harmonic and inharmonic voice sources, vocal tract transfer function, fundamental frequency, and amplitude of the voice, which together serve to completely quantify the integral sound of a target voice sample. In experiment 1, 200 voices with and without diagnosed vocal pathology were fit with the model using analysis-by-synthesis. The resulting synthetic voice samples were not distinguishable from the original voice tokens, suggesting that the model has all the parameters it needs to fully quantify voice quality. In experiment 2 parameters that model the harmonic voice source were removed one by one, and the voice tokens were re-synthesized with the reduced model. In every case the lower-dimensional models provided worse perceptual matches to the quality of the natural tokens than did the original set, indicating that the psychoacoustic model cannot be reduced in dimensionality without loss of fit to the data. Results confirm that this model can be validly applied to quantify voice quality in clinical and research applications.