scispace - formally typeset
Search or ask a question

Showing papers on "Microphone array published in 2022"


Journal ArticleDOI
TL;DR: In this paper , a cross-correlation based direction of arrival (DOA) estimation technique using the time difference of arrival at different microphone pairs, with noise angular spectrum subtraction, is proposed.
Abstract: This paper presents a sound source localization method using an irregular microphone array embedded in a drone. Sound source localization is an integral function of drone audition systems which enables various applications of drones such as search and rescue missions. However, the audio recordings using the on-board microphones obscure the sound emitted by a source on the ground due to drone generated motor and propeller noise, thus leading to an extremely low signal-to-drone noise ratio (SdNR). In this paper, we propose a cross-correlation based direction of arrival (DOA) estimation technique using the time difference of arrival (TDOA) at different microphone pairs, with noise angular spectrum subtraction. Through the measured current-specific drone noise spectrum, noise suppression has been achieved from the multi-channel recordings. Experimental results show that the proposed method is capable of estimating the position in three-dimensional space for simultaneously active multiple sound sources on the ground at low SdNR conditions ($-30$ dB), and localize two sound sources located at a certain azimuth angular separation with low prediction error comparable to the multiple signal classification (MUSIC) based algorithms and the generalized cross-correlation with phase transformation (GCC-PHAT) method. Due to its simplicity, applicability to any array geometry, and better robustness against drone noise, the proposed method increases the feasibility of localization under extreme SdNR levels.

17 citations


Journal ArticleDOI
TL;DR: In this article , a dual-branched spherical convolutional autoencoder is proposed to obtain high-resolution localization results from the conventional spherical beamforming maps while incorporating frequency-variant and distortion-invariant strategies to address the inherent challenges.
Abstract: While sound source localization (SSL) using a spherical microphone array system can be applied to obtain visual beam patterns of source distribution maps in a range of omnidirectional acoustic applications, the present challenges of the spherical measurement system on the valid frequency ranges and the spatial distortion as well as the grid-related limitations of data-driven SSL approaches raise the need to develop an appropriate method. Imbued by these challenges, this study proposes a deep learning (DL) approach to achieve the high-resolution performance of localizing multiple sound sources tailored for omnidirectional acoustic applications. First, we present a spherical target map representation that can panoramically pinpoint the position and strength information of multiple sound sources without any grid-related constraints. Then, a dual-branched spherical convolutional autoencoder is proposed to obtain high-resolution localization results from the conventional spherical beamforming maps while incorporating frequency-variant and distortion-invariant strategies to address the inherent challenges. We quantitatively and qualitatively assess our proposed method’s localization capability for multiple sound sources and validate that the proposed method can achieve far more precise and computationally efficient results than the existing approaches. By extension, we newly present the experimental setup that can create omnidirectional acoustic scenarios for the multiple SSL. By evaluating our proposed method in this experimental setup, we demonstrate the effectiveness and applicability of the proposed method with the experimental data. Our study delivers the proposed approach’s potential of being utilized in various SSL applications.

12 citations


Proceedings ArticleDOI
27 Jun 2022
TL;DR: This paper presents the design and implementation of SPiDR, an ultra-low-power spatial sensing system for miniature mobile robots that produces a cross-sectional map of the field-of-view using only one speaker/microphone pair.
Abstract: This paper presents the design and implementation of SPiDR, an ultra-low-power spatial sensing system for miniature mobile robots. This acoustic sensor produces a cross-sectional map of the field-of-view using only one speaker/microphone pair. While it is challenging to have enough spatial diversity of signal with a single omnidirectional source, we leverage sound's interaction with small structures to create a 3D-printed passive filter, called a stencil, that can project spatially coded signals on a region at a fine granularity. The system receives a linear combination of the reflections from nearby objects and applies a novel power-aware depth-map reconstruction algorithm. The algorithm first estimates the approximate locations of the objects in the scene and then iteratively applies fractional multi-resolution inversion. SPiDR consumes only 10mW of power to generate a depth-map in real-world scenario with over 80% structural similarity score with the scene.

10 citations


Journal ArticleDOI
TL;DR: In this article , an advanced duct microphone array analysis based on a user friendly iterative Bayesian Inverse Approach (iBIA) has been successfully applied to assess the modal content and the Sound Power Level of fan/outlet guide vanes (OGV) broadband noise.

9 citations


Proceedings ArticleDOI

[...]

27 Jun 2022
TL;DR: SPiDR as mentioned in this paper is an ultra-low-power spatial sensing system for miniature mobile robots that produces a cross-sectional map of the field-of-view using only one speaker/microphone pair.
Abstract: This paper presents the design and implementation of SPiDR, an ultra-low-power spatial sensing system for miniature mobile robots. This acoustic sensor produces a cross-sectional map of the field-of-view using only one speaker/microphone pair. While it is challenging to have enough spatial diversity of signal with a single omnidirectional source, we leverage sound's interaction with small structures to create a 3D-printed passive filter, called a stencil, that can project spatially coded signals on a region at a fine granularity. The system receives a linear combination of the reflections from nearby objects and applies a novel power-aware depth-map reconstruction algorithm. The algorithm first estimates the approximate locations of the objects in the scene and then iteratively applies fractional multi-resolution inversion. SPiDR consumes only 10mW of power to generate a depth-map in real-world scenario with over 80% structural similarity score with the scene.

8 citations


Posted ContentDOI
TL;DR: In this paper , a microphone-based sensor system that is able to localize sound events inside an SAV is presented, which is composed of a Micro-Electro-Mechanical System (MEMS) microphone array with a circular geometry connected to an embedded processing platform that resorts to Field-Programmable Gate Array (FPGA) technology to successfully process in hardware the sound localization algorithms.
Abstract: With the current technological transformation in the automotive industry, autonomous vehicles are getting closer to the Society of Automative Engineers (SAE) automation level 5. This level corresponds to the full vehicle automation, where the driving system autonomously monitors and navigates the environment. With SAE-level 5, the concept of a Shared Autonomous Vehicle (SAV) will soon become a reality and mainstream. The main purpose of an SAV is to allow unrelated passengers to share an autonomous vehicle without a driver/moderator inside the shared space. However, to ensure their safety and well-being until they reach their final destination, it is required an active monitoring of all passengers. In this context, this article presents a microphone-based sensor system that is able to localize sound events inside an SAV. The solution is composed of a Micro-Electro-Mechanical System (MEMS) microphone array with a circular geometry connected to an embedded processing platform that resorts to Field-Programmable Gate Array (FPGA) technology to successfully process in hardware the sound localization algorithms.

8 citations


Journal ArticleDOI
TL;DR: In this article , two microphone linear arrays were used to locate the sound source in an indoor environment by using the generalized cross-correlation algorithm to calculate the TDOA, which is also designed to deal with the problem of delay in the reception of sound signals from two microphone arrays.
Abstract: Sound signals have been widely applied in various fields. One of the popular applications is sound localization, where the location and direction of a sound source are determined by analyzing the sound signal. In this study, two microphone linear arrays were used to locate the sound source in an indoor environment. The TDOA is also designed to deal with the problem of delay in the reception of sound signals from two microphone arrays by using the generalized cross-correlation algorithm to calculate the TDOA. The proposed microphone array system with the algorithm can successfully estimate the sound source’s location. The test was performed in a standardized chamber. This experiment used two microphone arrays, each with two microphones. The experimental results prove that the proposed method can detect the sound source and obtain good performance with a position error of about 2.0~2.3 cm and angle error of about 0.74 degrees. Therefore, the experimental results demonstrate the feasibility of the system.

7 citations


Journal ArticleDOI
TL;DR: A generalized framework for describing spatial audio signal processing for the binaural reproduction of recorded sound is proposed, and specific methods for signal transformations such as rotation, translation and enhancement, enabling additional flexibility in reproduction and improvement in the quality of the bINAural signal are presented.
Abstract: Spatial audio has been studied for several decades, but has seen much renewed interest recently due to advances in both software and hardware for capture and playback, and the emergence of applications such as virtual reality and augmented reality. This renewed interest has led to the investment of increasing efforts in developing signal processing algorithms for spatial audio, both for capture and for playback. In particular, due to the popularity of headphones and earphones, many spatial audio signal processing methods have dealt with binaural reproduction based on headphone listening. Among these new developments, processing spatial audio signals recorded in real environments using microphone arrays plays an important role. Following this emerging activity, this paper aims to provide a scientific review of recent developments and an outlook for future challenges. This review also proposes a generalized framework for describing spatial audio signal processing for the binaural reproduction of recorded sound. This framework helps to understand the collective progress of the research community, and to identify gaps for future research. It is composed of five main blocks, namely: the acoustic scene, recording, processing, reproduction, and perception and evaluation. First, each block is briefly presented, and then, a comprehensive review of the processing block is provided. This includes topics from simple binaural recording to Ambisonics and perceptually motivated approaches, which focus on careful array configuration and design. Beamforming and parametric-based processing afford more flexible designs and shift the focus to processing and modeling of the sound field. Then, emerging machine- and deep-learning approaches, which take a further step towards flexibility in design, are described. Finally, specific methods for signal transformations such as rotation, translation and enhancement, enabling additional flexibility in reproduction and improvement in the quality of the binaural signal, are presented. The review concludes by highlighting directions for future research.

7 citations


Journal ArticleDOI
TL;DR: In this paper , a hybrid microphone array signal processing approach was proposed for the near-field scenario that combines the beamforming technique and DNNs to identify both the sound source location and content, which is quite suitable for a sound field which contains a dominant, stronger sound source and masked, weaker sound sources.
Abstract: Synchronistical localization, separation, and reconstruction for multiple sound sources are usually necessary in various situations, such as in conference rooms, living rooms, and supermarkets. To improve the intelligibility of speech signals, the application of deep neural networks (DNNs) has achieved considerable success in the area of time-domain signal separation and reconstruction. In this paper, we propose a hybrid microphone array signal processing approach for the nearfield scenario that combines the beamforming technique and DNN. Using this method, the challenge of identifying both the sound source location and content can be overcome. Moreover, the use of a sequenced virtual sound field reconstruction process enables the proposed approach to be quite suitable for a sound field which contains a dominant, stronger sound source and masked, weaker sound sources. Using this strategy, all traceable, mainly sound, sources can be discovered by loops in a given sound field. The operational duration and accuracy of localization are further improved by substituting the broadband weighted multiple signal classification (BW-MUSIC) method for the conventional delay-and-sum (DAS) beamforming algorithm. The effectiveness of the proposed method for localizing and reconstructing speech signals was validated by simulations and experiments with promising results. The localization results were accurate, while the similarity and correlation between the reconstructed and original signals was high.

7 citations


Journal ArticleDOI
TL;DR: In this paper , a parametric signal-dependent method for the task of encoding microphone array signals into Ambisonic signals was presented and evaluated in the context of encoding a simulated seven-sensor microphone array, which is mounted on an AR headset device.
Abstract: This article proposes a parametric signal-dependent method for the task of encoding microphone array signals into Ambisonic signals. The proposed method is presented and evaluated in the context of encoding a simulated seven-sensor microphone array, which is mounted on an augmented reality headset device. Given the inherent flexibility of the Ambisonics format, and its popularity within the context of such devices, this array configuration represents a potential future use case for Ambisonic recording. However, due to its irregular geometry and non-uniform sensor placement, conventional signal-independent Ambisonic encoding is particularly limited. The primary aims of the proposed method are to obtain Ambisonic signals over a wider frequency band-width, and at a higher spatial resolution, than would otherwise be possible through conventional signal-independent encoding. The proposed method is based on a multi-source sound-field model and employs spatial filtering to divide the captured sound-field into its individual source and directional ambient components, which are subsequently encoded into the Ambisonics format at an arbitrary order. It is demonstrated through both objective and perceptual evaluations that the proposed parametric method outperforms conventional signal-independent encoding in the majority of cases.

7 citations


Proceedings ArticleDOI
23 May 2022
TL;DR: In this article , the first-order relative harmonic coefficients (RHC) is used to derive a direction vector, which points towards the desired source direction, and two objective metrics, namely localization accuracy and algorithm complexity, are adopted for the evaluation and comparison with existing RHC-based and intensity based localization approaches, in both simulated and real-life environments.
Abstract: The relative harmonic coefficients (RHC), recently introduced as a multi-microphone spatial feature, demonstrates promising performance when applied to direction-of-arrival (DOA) estimation. All existing RHC-based DOA estimators suffer from a resolution limitation due to the inherent grid-based search. In contrast, this paper utilizes the first-order RHC to propose a closed-form DOA estimator by deriving a direction vector, which points towards to the desired source direction. Two objective metrics, namely localization accuracy and algorithm complexity, are adopted for the evaluation and comparison with existing RHC-based and intensity based localization approaches, in both simulated and real-life environments.

Journal ArticleDOI
TL;DR: In this article , the laser Doppler vibrometer (LDV) was used as an optical laser microphone for human-robot interaction in extremely noisy service environments, where the robot irradiates an object near a speaker with a laser and measures the vibration of the object to record the sound.
Abstract: Domestic robots are often required to understand spoken commands in noisy environments, including service appliances' operating sounds. Most conventional domestic robots use electret condenser microphones (ECMs) to record the sound. However, the ECMs are known to be sensitive to the noise in the direction of sound arrival. The laser Doppler vibrometer (LDV), which has been widely used in the research field of measurement, has the potential to work as a new speech-input device to solve this problem. The aim of this paper is to investigate the effectiveness of using the LDV as an optical laser microphone for human-robot interaction in extremely noisy service environments. Our robot irradiates an object near a speaker with a laser and measures the vibration of the object to record the sound. We conducted three experiments to assess the performance of speech recognition using the optical laser microphone in various settings and showed stable performance in extremely noisy conditions compared with a conventional ECM. GRAPHICAL ABSTRACT

Journal ArticleDOI
TL;DR: In this paper , a method for direction of arrival (DOA) estimation of multiple speech sources based on the temporal correlation and local-frequency stationarity of speech signals is presented.

Proceedings ArticleDOI
23 May 2022
TL;DR: In this article , a tensor singular value decomposition based non-synchronous measurements method for broadband multiple sound source localization is proposed, where the working frequency range of the microphone array is no longer limited by the array geometry.
Abstract: It is a challenge for the state-of-the-art non-synchronous measurements beamforming method to localize multiple broa-dband sources due to the difficulty in selecting an appropriate operating frequency without any prior information about the target signals. In this paper, we propose a tensor singular value decomposition based non-synchronous measurements method for broadband multiple sound source localization. By adopting the proposed method, the working frequency range of the microphone array is no longer limited by the array geometry. While the proposed tensor completion approach via alternating direction method of multipliers algorithm could provide a sound map with distinct global view of three different speech signal sources with high accuracy in both simulation and experimental validations.

Journal ArticleDOI
28 Jan 2022-Sensors
TL;DR: In this paper , a combination of one-step based method based on the generalized eigenvalue decomposition (GEVD), and a two-step-based method by using the adaptive generalized cross-correlation (GCC) was proposed for 3D multiple simultaneous sound source localization.
Abstract: Multiple simultaneous sound source localization (SSL) is one of the most important applications in the speech signal processing. The one-step algorithms with the advantage of low computational complexity (and low accuracy), and the two-step methods with high accuracy (and high computational complexity) are proposed for multiple SSL. In this article, a combination of one-step-based method based on the generalized eigenvalue decomposition (GEVD), and a two-step-based method based on the adaptive generalized cross-correlation (GCC) by using the phase transform/maximum likelihood (PHAT/ML) filters along with a novel T-shaped circular distributed microphone array (TCDMA) is proposed for 3D multiple simultaneous SSL. In addition, the low computational complexity advantage of the GCC algorithm is considered in combination with the high accuracy of the GEVD method by using the distributed microphone array to eliminate spatial aliasing and thus obtain more appropriate information. The proposed T-shaped circular distributed microphone array-based adaptive GEVD and GCC-PHAT/ML algorithms (TCDMA-AGGPM) is compared with hierarchical grid refinement (HiGRID), temporal extension of multiple response model of sparse Bayesian learning with spherical harmonic (SH) extension (SH-TMSBL), sound field morphological component analysis (SF-MCA), and time-frequency mixture weight Bayesian nonparametric acoustical holography beamforming (TF-MW-BNP-AHB) methods based on the mean absolute estimation error (MAEE) criteria in noisy and reverberant environments on simulated and real data. The superiority of the proposed method is presented by showing the high accuracy and low computational complexity for 3D multiple simultaneous SSL.

Proceedings ArticleDOI
23 May 2022
TL;DR: In this paper , a causal array-geometry-agnostic multi-channel personalized speech enhancement (PSE) model was proposed, which can generate a high-quality enhanced signal from arbitrary microphone geometry.
Abstract: With the recent surge of video conferencing tools usage, providing high-quality speech signals and accurate captions have become essential to conduct day-to-day business or connect with friends and families. Single-channel personalized speech enhancement (PSE) methods show promising results compared with the unconditional speech enhancement (SE) methods in these scenarios due to their ability to remove interfering speech in addition to the environmental noise. In this work, we leverage spatial information afforded by microphone arrays to improve such systems’ performance further. We investigate the relative importance of speaker embeddings and spatial features. Moreover, we propose a new causal array-geometry-agnostic multi-channel PSE model, which can generate a high-quality enhanced signal from arbitrary microphone geometry. Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy. We also demonstrate the effectiveness of the proposed approach for unseen array geometries.

Journal ArticleDOI
TL;DR: This paper designs new circular harmonic features that are frequency-invariant as inputs to the CNN architecture, so as to offer improvements in DOA estimation in unseen adverse environments and obtain good adaptation to array imperfections.
Abstract: The problem of direction of arrival (DOA) estimation with a circular microphone array has been addressed with classical source localization methods, such as the model-based methods and the param etric methods. These methods have an advantage in estimating the DOAs in a blind manner, i.e. with no (or limited) prior knowledge about the sound sources. However, their performance tends to degrade rapidly in noisy and reverberant environments or in the presence of sensor array limitations, such as sensor gain and phase errors. In this paper, we present a new approach by leveraging the strength of a convolutional neural network (CNN)-based deep learning approach. In particular, we design new circular harmonic features that are frequency-invariant as inputs to the CNN architecture, so as to offer improvements in DOA estimation in unseen adverse environments and obtain good adaptation to array imperfections. To our knowledge, such a deep learning approach has not been used in the circular harmonic domain. Experiments performed on both simulated and real-data show that our method gives significantly better performance, than the recent baseline methods, in a variety of noise and reverberation levels, in terms of the accuracy of the DOA estimation.

Proceedings ArticleDOI
13 Jun 2022
TL;DR: Using hybrid beamforming with a 120-microphone near-field array, the spatio-spectral lobes are observed as multiple peaks in noise spectra at a given field location or multiple local maxima in noise directivity at a single frequency as discussed by the authors .
Abstract: Spatiospectral lobes are still-unexplained phenomena seen in noise radiation from multiple high-performance aircraft. These lobes are observed as multiple peaks in noise spectra at a given field location or as multiple local maxima in noise directivity at a single frequency. Using hybrid beamforming with a 120-microphone near-field array, the lobe characteristics are studied for a GE F404 engine installed on a T-7A aircraft at different engine conditions. Both the measured and reconstructed fields show multiple spatiospectral lobes at different engine conditions, and the overall noise directivity is identified as being the superposition of multiple distinct lobes. The individual lobes appear, shift aft, and then disappear with increasing frequency, which is opposite the behavior of overall noise directivity. The lobes are ray-traced back to the jet centerline to determine an apparent acoustic source location and it is concluded that each lobe originates from a different source within the jet.

Journal ArticleDOI
TL;DR: In this article , the authors evaluate different DNNs and signal processing-based methods for DOA estimation when attention is applied and propose training strategies for attention-based estimation optimized via a DOA objective.

Journal ArticleDOI
TL;DR: In this article , a novel method based on the off-grid model and group sparsity is proposed to improve the accuracy and efficiency of source identification, where both source locations and strengths are considered in an integrated model to reconstruct the sound field with the microphone array at arbitrary positions.

Proceedings ArticleDOI
03 Jan 2022
TL;DR: In this paper , the authors adapted the delay-and-sum beamforming method and associated deconvolution techniques for microphone measurements that comprise of an array of sensors and continuously-scanning sensors.
Abstract: The paper adapts the delay-and-sum beamforming method and associated deconvolution techniques for microphone measurements that comprise fixed and continuously-scanning sensors. The signals from the scanning sensors are non-stationary due to the traversing of a spatially-varyingacoustic field. Quasi-stationarityis sought by dividing the signals into smaller blocks and applying a frequency-dependent window within each block. In addition, the motion of the sensors requires a modification to the steering vectors to include a Doppler-shifted frequency. Three distinct methods are used to generate the noise source maps. The first is a natural extension of the delay-and-sum process for continuously-scanning microphone arrays. The source image is obtained by using the distinct contributions from the cross-spectral matrices for each block. This technique shows a suppressed level of the sidelobes and an increased spatial resolution compared to the use of delay-and-sum with fixed sensors. Two additional processes are presented and adapted to the continuous-scan paradigm with the aim of constructing a global cross-spectral matrix that is representative of the complete experimental run. The global cross-spectral matrix is obtained with partial-fields decomposition and a cross-spectral matrix completion technique. These two methods show a higher suppression of the sidelobes compared to the first. All three techniques yield highly-resolved noise source maps of similar quality, attaining very high spatial resolutions. Advanced beamforming and deconvolution techniques are used to further enhance the spatial resolution of the noise source. The techniques are first applied to the imaging of a synthetic distributed source to assess their performance. The methods are then used to obtain the noise source distribution of an imperfectly-expanded supersonic jet that presented the phenomenon of screech. It is demonstrated that any of the methodologies introduced allows the resolution of the shock cells in the jet plume. intricate flow field containing shock cells in combination with large and fine scale turbulence structures interacting with them and creating additional noise sources. The paper is structured as follows: the steps involved in traditional beamforming are first introduced. This methodology is extended to phased arrays that comprise fixed and continuously-scanning sensors. Then, two distinct methods, including a cross-spectral matrix completion (CSMC) and partial fields decomposition (PF) are introduced to construct a global CSM that is representative of the full experiment. The experimental setup is briefly described next, including the microphone array and the noise sources of interest. The results and conclusions sections follow. start for 𝑓 This is true for the PF decomposition and CSMC

Proceedings ArticleDOI
13 Jun 2022
TL;DR: In this article , the deconvolution approach for the mapping of acoustic sources (DAMAS) algorithm is applied to recent airframe noise test data acquired in the NASA Langley 14- by 22-foot Subsonic Tunnel.
Abstract: Microphone phased arrays are a common tool for use in aeroacoustic wind tunnel testing. The analysis of acquired array data is known to suffer from decorrelation effects, where the coherence of an acoustic wave measured by a pair of microphones is degraded as the wave passes through a turbulent free shear layer or boundary layer. This paper describes, in detail, how to mitigate the influence of decorrelation effects when processing array data with deconvolution methods. This is done using the Deconvolution Approach for the Mapping of Acoustic Sources (DAMAS) algorithm as an example, applied to recent airframe noise test data acquired in the NASA Langley 14- by 22-Foot Subsonic Tunnel. Two ways of handling the turbulent propagation modeling, both assuming plane wave propagation, are described. Results show that while turbulence model fit parameters may differ, both methods output extremely similar deconvolution results. Further improvements likely require more accurate mean shear layer data prior to developing more involved turbulence models.

Journal ArticleDOI
TL;DR: In this paper , a novel approach to evaluate dominant railway noise sources for all bogie regions, by using an array of microphones and a physics-informed graph neural networks (GNN) model, was proposed.
Abstract: Identifying and evaluating railway noise sources is beneficial for establishing more targeted suppression and mitigation strategies. This paper proposes a novel approach to evaluate dominant railway noise sources for all bogie regions, by using an array of microphones and a physics-informed graph neural networks (GNN) model. The GNN model encodes not only the acoustic data collected by the microphone array but also the spatial relationship between noise sources and microphones, and it also considers the physical background of sound in terms of Doppler effect and acoustic attenuation. In-situ measurements of railway noises have been taken on an operating metro line to train and validate the GNN model which exhibits satisfactory performance in evaluating the noise sources of interest. The proposed approach enables a more flexible and cost-effective microphone array implementation strategy in comparison to the commonly adopted commercialized acoustic beamforming system.

Journal ArticleDOI
18 Jan 2022-Sensors
TL;DR: In this article , a dual-microphone based sound localization and speech enhancement algorithm is proposed based on the time delay estimation of the signal received by the dual microphones, which combines energy difference estimation and controllable beam response power.
Abstract: In order to simplify the complexity and reduce the cost of the microphone array, this paper proposes a dual-microphone based sound localization and speech enhancement algorithm. Based on the time delay estimation of the signal received by the dual microphones, this paper combines energy difference estimation and controllable beam response power to realize the 3D coordinate calculation of the acoustic source and dual-microphone sound localization. Based on the azimuth angle of the acoustic source and the analysis of the independent quantity of the speech signal, the separation of the speaker signal of the acoustic source is realized. On this basis, post-wiener filtering is used to amplify and suppress the voice signal of the speaker, which can help to achieve speech enhancement. Experimental results show that the dual-microphone sound localization algorithm proposed in this paper can accurately identify the sound location, and the speech enhancement algorithm is more robust and adaptable than the original algorithm.

Journal ArticleDOI
TL;DR: In this article , a deep learning-based framework that integrates single-channel noise reduction and multichannel source localization is proposed to suppress the ego-noise from rotating motors and propellers as well as the movement of the drone and the sound sources.
Abstract: Sound source localization from a flying drone is a challenging task due to the strong ego-noise from rotating motors and propellers as well as the movement of the drone and the sound sources. To address this challenge, we propose a deep-learning-based framework that integrates single-channel noise reduction and multichannel source localization. In this framework, we suppress the ego-noise and estimate a time–frequency soft ratio mask with a single-channel deep neural network (DNN). Then, we design two downstream multichannel source localization algorithms, based on steered response power (SRP-DNN) and time–frequency spatial filtering (TFS-DNN). The main novelty lies in the proposed TFS-DNN approach, which estimates the presence probability of the target sound at the individual time–frequency bins by combining the DNN-inferred soft ratio mask and the instantaneous direction of arrival (DOA) of the sound received by the microphone array. The time–frequency presence probability of the target sound is then used to design a set of spatial filters to construct a spatial likelihood map for source localization. By jointly exploiting spectral and spatial information, TFS-DNN robustly processes signals in short segments (e.g., 0.5 s) in dynamic and low signal-to-noise-ratio (SNR) scenarios (e.g., SNR −20 dB). Results on real and simulated data in a variety of scenarios (static sources, moving sources, and moving drones) indicate the advantage of TFS-DNN over competing methods, including SRP-DNN and the state-of-the-art TFS.

Proceedings ArticleDOI
16 Feb 2022
TL;DR: This work proposes to use deep learning techniques to learn competing and time-varying direct-path phase differences for localizing multiple moving sound sources by using a causal convolutional recurrent neural network to extract the direct- paths difference sequence from signals of each microphone pair.
Abstract: Multiple moving sound source localization in real-world scenarios remains a challenging issue due to interaction between sources, time-varying trajectories, distorted spatial cues, etc. In this work, we propose to use deep learning techniques to learn competing and time-varying direct-path phase differences for localizing multiple moving sound sources. A causal convolutional recurrent neural network is designed to extract the direct-path phase difference sequence from signals of each microphone pair. To avoid the assignment ambiguity and the problem of uncertain output-dimension encountered when simultaneously predicting multiple targets, the learning target is designed in a weighted sum format, which encodes source activity in the weight and direct-path phase differences in the summed value. The learned direct-path phase differences for all microphone pairs can be directly used to construct the spatial spectrum according to the formulation of steered response power (SRP). This deep neural network (DNN) based SRP method is referred to as SRP-DNN. The locations of sources are estimated by iteratively detecting and removing the dominant source from the spatial spectrum, in which way the interaction between sources is reduced. Experimental results on both simulated and real-world data show the superiority of the proposed method in the presence of noise and reverberation.

Journal ArticleDOI
TL;DR: In this article , a method to effectively perform sound source enhancement from an unmanned aerial vehicle (UAV)-mounted audio recording system is proposed, which uses audio recordings and non-acoustical UAV rotor characteristics to improve rotor noise power spectral density estimation accuracy and robustness.

Journal ArticleDOI
26 May 2022-Sensors
TL;DR: The method presented here takes advantage of a microphone array with a processing based on time domain Delay and Sum Beamforming to have a good robustness to noise and to localize an UAV with a poor spectral content or to separate two UAVs with different spectral contents.
Abstract: The development of unmanned aerial vehicles (UAVs) opens up a lot of opportunities but also brings some threats. Dealing with these threats is not easy and requires some good techniques. Knowing the location of the threat is essential to deal with an UAV that is displaying disturbing behavior. Many methods exist but can be very limited due to the size of UAVs or due to technological improvements over the years. However, the noise produced by the UAVs is still predominant, so it gives a good opening for the development of acoustic methods. The method presented here takes advantage of a microphone array with a processing based on time domain Delay and Sum Beamforming. In order to obtain a better signal to noise ratio, the UAV’s acoustic signature is taken into account in the processing by using a time-frequency representation of the beamformer’s output. Then, only the content related to this signature is considered to calculate the energy in one direction. This method enables to have a good robustness to noise and to localize an UAV with a poor spectral content or to separate two UAVs with different spectral contents. Simulation results and those of a real flight experiment are reported.

Journal ArticleDOI
TL;DR: In this article , a microphone-based sensor system that is able to localize sound events inside an SAV is presented, which is composed of a Micro-Electro-Mechanical System (MEMS) microphone array with a circular geometry connected to an embedded processing platform that resorts to Field-Programmable Gate Array (FPGA) technology to successfully process in the hardware the sound localization algorithms.
Abstract: With the current technological transformation in the automotive industry, autonomous vehicles are getting closer to the Society of Automative Engineers (SAE) automation level 5. This level corresponds to the full vehicle automation, where the driving system autonomously monitors and navigates the environment. With SAE-level 5, the concept of a Shared Autonomous Vehicle (SAV) will soon become a reality and mainstream. The main purpose of an SAV is to allow unrelated passengers to share an autonomous vehicle without a driver/moderator inside the shared space. However, to ensure their safety and well-being until they reach their final destination, active monitoring of all passengers is required. In this context, this article presents a microphone-based sensor system that is able to localize sound events inside an SAV. The solution is composed of a Micro-Electro-Mechanical System (MEMS) microphone array with a circular geometry connected to an embedded processing platform that resorts to Field-Programmable Gate Array (FPGA) technology to successfully process in the hardware the sound localization algorithms.

Journal ArticleDOI
TL;DR: In this article , a two-dimensional sound source localization based on diaphragm extrinsic Fabry-Perot interferometer (EFPI) fiber microphone array has been studied.