scispace - formally typeset
Search or ask a question

Showing papers on "Acoustic source localization published in 2017"


Journal ArticleDOI
TL;DR: An extensive review and classification of SSL techniques and popular tracking methodologies; different facets of SSL as well as its state-of-the-art; evaluation methodologies used for SSL; and a set of challenges and research motivations are presented.

201 citations


Journal ArticleDOI
TL;DR: Common approaches for source localization in WASNs that are focused on different types of acoustic features, namely, the energy of the incoming signals, their time of arrival or time difference of arrival, the direction of arrival (DOA), and the steered response power (SRP) resulting from combining multiple microphone signals are reviewed.
Abstract: Wireless acoustic sensor networks (WASNs) are formed by a distributed group of acoustic-sensing devices featuring audio playing and recording capabilities. Current mobile computing platforms offer great possibilities for the design of audio-related applications involving acoustic-sensing nodes. In this context, acoustic source localization is one of the application domains that have attracted the most attention of the research community along the last decades. In general terms, the localization of acoustic sources can be achieved by studying energy and temporal and/or directional features from the incoming sound at different microphones and using a suitable model that relates those features with the spatial location of the source (or sources) of interest. This paper reviews common approaches for source localization in WASNs that are focused on different types of acoustic features, namely, the energy of the incoming signals, their time of arrival (TOA) or time difference of arrival (TDOA), the direction of arrival (DOA), and the steered response power (SRP) resulting from combining multiple microphone signals. Additionally, we discuss methods not only aimed at localizing acoustic sources but also designed to locate the nodes themselves in the network. Finally, we discuss current challenges and frontiers in this field.

117 citations


Journal ArticleDOI
TL;DR: This study shows that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.
Abstract: This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.

96 citations


Journal ArticleDOI
TL;DR: The test results indicate that the method of source localization using low power consumption is consistently accurate and is able to determine sound source localization multiple times over an extended period.
Abstract: In this study, we show that the energy-based method of sound source localization can be successfully exploited for sound source localization under low power consumption conditions. Sound source localization is widely applied in battlefield environments where low power consumption is especially crucial and necessary for extending the lifespan of sensor nodes. We propose several variables that may (possibly) affect the path loss exponent. We provide data that shows that energy-based methods of sound source localization can accurately determine the appropriate path loss exponent and can improve localization accuracy. The results of our study also demonstrate that energy-based methods of sound source localization can significantly reduce localization errors by adjusting sensors’ weight coefficients when ambient (background) noise exists. Our test results indicate that our method of source localization using low power consumption is consistently accurate and is able to determine sound source localization multiple times over an extended period.

89 citations


Journal ArticleDOI
03 Nov 2017-Sensors
TL;DR: The design and implementation of a UAV-embedded microphone array system for sound source localization in outdoor environments and results confirmed that the SMAS provides highly accurate localization, water resistance, prompt assembly, stable wireless communication, and intuitive information for observers and operators.
Abstract: In search and rescue activities, unmanned aerial vehicles (UAV) should exploit sound information to compensate for poor visual information. This paper describes the design and implementation of a UAV-embedded microphone array system for sound source localization in outdoor environments. Four critical development problems included water-resistance of the microphone array, efficiency in assembling, reliability of wireless communication, and sufficiency of visualization tools for operators. To solve these problems, we developed a spherical microphone array system (SMAS) consisting of a microphone array, a stable wireless network communication system, and intuitive visualization tools. The performance of SMAS was evaluated with simulated data and a demonstration in the field. Results confirmed that the SMAS provides highly accurate localization, water resistance, prompt assembly, stable wireless communication, and intuitive information for observers and operators.

83 citations


Journal ArticleDOI
TL;DR: A new technique is introduced for acoustic source localization in an anisotropic plate by dealing with non‐circular shape of wave fronts, which means the acoustic source could be successfully localized without knowing the material properties of the plate.

53 citations


Journal ArticleDOI
TL;DR: The kernel regression as well as the local linear regression are compared to typical inversion techniques, namely Matched Field Beamforming and the algorithm MUSIC and it is shown that the machine learning approaches may outperform the inversion Techniques.

52 citations


Proceedings ArticleDOI
TL;DR: In this article, a likelihood-based encoding of the network output is proposed for simultaneous detection and localization of multiple sound sources in human-robot interaction, which naturally allows the detection of an arbitrary number of sources.
Abstract: We propose to use neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization methods require fewer strong assumptions about the environment. Previous neural network-based methods have been focusing on localizing a single sound source, which do not extend to multiple sources in terms of detection and localization. In this paper, we thus propose a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources. In addition, we investigate the use of sub-band cross-correlation information as features for better localization in sound mixtures, as well as three different network architectures based on different motivations. Experiments on real data recorded from a robot show that our proposed methods significantly outperform the popular spatial spectrum-based approaches.

51 citations


Journal ArticleDOI
TL;DR: This paper addresses the problem of estimating the target sound direction of arrival (DoA) for a binaural HAS given access to the noise-free content of the target signal and proposes three different RTF models which have different degrees of accuracy and individualization.
Abstract: Recent hearing aid systems (HASs) can connect to a wireless microphone worn by the talker of interest. This feature gives the HASs access to a noise-free version of the target signal. In this paper, we address the problem of estimating the target sound direction of arrival (DoA) for a binaural HAS given access to the noise-free content of the target signal. To estimate the DoA, we present a maximum-likelihood framework which takes the shadowing effect of the user's head on the received signals into account by modeling the relative transfer functions (RTFs) between the HAS's microphones. We propose three different RTF models which have different degrees of accuracy and individualization. Furthermore, we show that the proposed DoA estimators can be formulated in terms of inverse discrete Fourier transforms to evaluate the likelihood function computationally efficiently. We extensively assess the performance of the proposed DoA estimators for various DoAs, signal to noise ratios, and in different noisy and reverberant situations. The results show that the proposed estimators improve the performance markedly over other recently proposed “informed” DoA estimator.

42 citations


Proceedings ArticleDOI
05 Mar 2017
TL;DR: This paper describes an unsupervised method of adapting deep neural networks (DNNs) for sound source localization (SSL) that improved localization accuracy by a maximum of 20 points for unknown positions and reverberant data.
Abstract: This paper describes an unsupervised method of adapting deep neural networks (DNNs) for sound source localization (SSL). DNNs-based SSL achieves high localization accuracy for sound data that are similar to training data. However, the accuracy deteriorates if a sound source is at an unknown position in unknown reverberant environments. We solve the problem by using unsupervised adaption of the DNNs' parameters to the observed sound signals. Entropy is used as the objective function and minimized to optimize the parameters on the basis of the gradient method. Adaptation without overfitting is achieved by using 1) a parameter adaptation layer, such as linear transform network, and 2) early stopping of the parameter updates. Experimental results indicated that our method improved localization accuracy by a maximum of 20 points for unknown positions and reverberant data.

42 citations


Journal ArticleDOI
TL;DR: A multiple sound source localization and counting method based on a relaxed sparsity of speech signal that achieves a higher accuracy of DOA estimation and source counting compared with the existing techniques has higher efficiency and lower complexity, which makes it suitable for real-time applications.
Abstract: In this work, a multiple sound source localization and counting method based on a relaxed sparsity of speech signal is presented. A soundfield microphone is adopted to overcome the redundancy and complexity of microphone array in this paper. After establishing an effective measure, the relaxed sparsity of speech signals is investigated. According to this relaxed sparsity, we can obtain an extensive assumption that “single-source” zones always exist among the soundfield microphone signals, which is validated by statistical analysis. Based on “single-source” zone detecting, the proposed method jointly estimates the number of active sources and their corresponding DOAs by applying a peak searching approach to the normalized histogram of estimated DOA. The cross distortions caused by multiple simultaneously occurring sources are solved by estimating DOA in these “single-source” zones. The evaluations reveal that the proposed method achieves a higher accuracy of DOA estimation and source counting compared with the existing techniques. Furthermore, the proposed method has higher efficiency and lower complexity, which makes it suitable for real-time applications.

Journal ArticleDOI
TL;DR: In this paper, the feasibility of using an acoustic metasurface (AMS) with acoustic stop-band property to realize sound insulation with ventilation function is investigated, and an efficient numerical approach is proposed to evaluate its sound insulation performance.

Proceedings ArticleDOI
13 Dec 2017
TL;DR: This paper addresses online outdoor sound source localization using a microphone array embedded in an unmanned aerial vehicle (UAV) to cope with trade-off between latency and noise robustness, and develops data compression based on free lossless audio codec extended to support a 16 ch audio data stream via UDP and a water-resistant microphone array.
Abstract: This paper addresses online outdoor sound source localization using a microphone array embedded in an unmanned aerial vehicle (UAV). In addition to sound source localization, sound source enhancement and robust communication method are also described. This system is one instance of deployment of our continuously developing open source software for robot audition called HARK (Honda Research Institute Japan Audition for Robots with Kyoto University). To improve the robustness against outdoor acoustic noise, we propose to combine two sound source localization methods based on MUSIC (multiple signal classification) to cope with trade-off between latency and noise robustness. The standard Eigenvalue decomposition based MUSIC (SEVD-MUSIC) has smaller latency but less noise robustness, whereas the incremental generalized singular value decomposition based MUSIC (iGSVD-MUSIC) has higher noise robustness but larger latency. A UAV operator can use an appropriate method according to the situation. A sound enhancement method called online robust principal component analysis (ORPCA) enables the operator to detect a target sound source more easily. To improve the stability of wireless communication, and robustness of the UAV system against weather changes, we developed data compression based on free lossless audio codec (FLAC) extended to support a 16 ch audio data stream via UDP, and developed a water-resistant microphone array. The resulting system successfully worked in an outdoor search and rescue task in ImPACT Tough Robotics Challenge in November 2016.

Proceedings ArticleDOI
05 Mar 2017
TL;DR: This paper proposes a novel approach for sound source tracking that constructively exploits the spatial diversity of a microphone array installed in a moving robot using expectation-maximization (EM) approaches and Bayesian approaches.
Abstract: Intuitive spoken dialogues are a prerequisite for human-robot interaction. In many practical situations, robots must be able to identify and focus on sources of interest in the presence of interfering speakers. Techniques such as spatial filtering and blind source separation are therefore often used, but rely on accurate knowledge of the source location. In practice, sound emitted in enclosed environments is subject to reverberation and noise. Hence, sound source localization must be robust to both diffuse noise due to late reverberation, as well as spurious detections due to early reflections. For improved robustness against reverberation, this paper proposes a novel approach for sound source tracking that constructively exploits the spatial diversity of a microphone array installed in a moving robot. In previous work, we developed speaker localization approaches using expectation-maximization (EM) approaches and using Bayesian approaches. In this paper we propose to combine the EM and Bayesian approach in one framework for improved robustness against reverberation and noise.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: The passive sound localization and classification system is designed and implemented and can detect the acoustic signature of power tools and the effectiveness of the system to be used as an early warning system to detect misuse of machinery is demonstrated.
Abstract: For a wide range of applications in industry, it is sometimes necessary to perform acoustic source localization. In this paper, a passive sound localization and classification system is designed and implemented. Each sensor consists of a microphone array which is used to detect the direction-of-arrival (DoA) of an acoustic signal. Multiple DoA sensors can be combined to form a wireless sensor network. The system can detect the acoustic signature of power tools and the effectiveness of the system to be used as an early warning system to detect misuse of machinery is demonstrated. It is shown that the system can detect the DoA of an acoustic signal with an overall mean estimation error of 7° and can correctly classify the signal source with a classification rate of 71.5%.

Journal ArticleDOI
TL;DR: A novel method that serves to improve the efficiency of DAMAS via wavelet compression computational grid rather than via optimizing DAMAS algorithm, which largely retains the spatial resolution of DamAS on original computational grid.

Proceedings ArticleDOI
05 Mar 2017
TL;DR: Experimental results with real-recorded MAV ego-noise show the superiority of the proposed time-frequency processing framework over the state of the art in performing source localization robustly.
Abstract: We address the problem of sound source localization with a microphone array mounted on a micro aerial vehicle (MAV). Due to the noise generated by motors and propellers, this scenario is characterized by extremely low signal-to-noise ratios (SNR). Based on the observation that the energy of MAV sound recordings is usually concentrated at isolated time-frequency bins, we propose a time-frequency processing framework to address this problem. We first estimate the direction of arrival of the sound at individual time-frequency bins. Then we formulate a set of spatially informed filters pointing at candidate directions in the search space. The output of the filtering tends to present high non-Gaussianity when the spatial filter is steered towards the target sound source. Finally, by measuring the non-Gaussianity of the spatial filtering outputs we build a spatial likelihood function from which we estimate the direction of the target sound. Experimental results with real-recorded MAV ego-noise show the superiority of the proposed method over the state of the art in performing source localization robustly.

Posted Content
TL;DR: It is shown that CNNs operating on cepstrogram and generalized cross-correlogram inputs are able to estimate more reliably the instantaneous range and bearing of transiting motor vessels when the source localization performance of conventional passive ranging methods is degraded.
Abstract: The propagation of sound in a shallow water environment is characterized by boundary reflections from the sea surface and sea floor. These reflections result in multiple (indirect) sound propagation paths, which can degrade the performance of passive sound source localization methods. This paper proposes the use of convolutional neural networks (CNNs) for the localization of sources of broadband acoustic radiated noise (such as motor vessels) in shallow water multipath environments. It is shown that CNNs operating on cepstrogram and generalized cross-correlogram inputs are able to more reliably estimate the instantaneous range and bearing of transiting motor vessels when the source localization performance of conventional passive ranging methods is degraded. The ensuing improvement in source localization performance is demonstrated using real data collected during an at-sea experiment.

Journal ArticleDOI
TL;DR: In this paper, a spatio-temporal filter bank is proposed to extract the sound field information in a specific region and remove noise, and the results indicate that the visibility of the sound fields is enhanced by using the proposed method.

Journal ArticleDOI
Mingjun Jiang1, Qingyi Gu1, Tadayoshi Aoyama1, Takes Takaki1, Idaku Ishii1 
TL;DR: In this paper, a concept of vision-based vibration source localization to extract vibration image regions using pixel-level digital filters in a high-frame-rate (HFR) video is proposed.
Abstract: In this paper, a concept of vision-based vibration source localization to extract vibration image regions using pixel-level digital filters in a high-frame-rate (HFR) video is proposed. The method can detect periodic changes in the audio frequency range in image intensities at pixels of vibrating objects. Owing to the acute directivity of the optical image sensor, our HFR-vision-based method can localize a vibration source more accurately than acoustic source localization methods. By applying pixel-level digital filters to clipped region-of-interest (ROI) images, in which the center position of a vibrating object is tracked at a fixed position, our method can reduce the latency effect on a digital filter, which may degrade the localization accuracy in vibration source tracking. Pixel-level digital filters for $128\times 128$ ROI images, which are tracked from $512\times 512$ input images, are implemented on a 1000-frames/s vision platform that can measure vibration distributions at 100 Hz or higher. Our tracking system allows a vibrating object to be tracked in real time at the center of the camera view by controlling a pan-tilt active vision system. We present several experimental tracking results using objects vibrating at high frequencies, which cannot be observed by standard video cameras or the naked human eye, including a flying quadcopter with rotating propellers, and demonstrate its performance in vibration source localization with sub-degree-level angular directivity, which is more acute than a few or more degrees of directivity in acoustic-based source localization.

Journal ArticleDOI
TL;DR: This contribution proposes to use an efficient global optimization method to search for the source locations that maximize the agreement between model and measurement, and chooses Differential Evolution as the method selected.
Abstract: Conventional beamforming with a microphone array is a well-established method for localizing and quantifying sound sources. It provides estimates for the source strengths on a predefined grid by determining the agreement between the pressures measured and those modeled for a source located at the grid point under consideration. As such, conventional beamforming can be seen as an exhaustive search for those locations that provide a maximum match between measured and modeled pressures. In this contribution, the authors propose to, instead of the exhaustive search, use an efficient global optimization method to search for the source locations that maximize the agreement between model and measurement. Advantages are two-fold. First, the efficient optimization allows for inclusion of more unknowns, such as the source position in three-dimensional or environmental parameters such as the speed of sound. Second, the model for the received pressure field can be readily adapted to reflect, for example, the presence of more sound sources or environmental parameters that affect the received signals. For the work considered, the global optimization method, Differential Evolution, is selected. Results with simulated and experimental data show that sources can be accurately identified, including the distance from the source to the array.

Journal ArticleDOI
15 Feb 2017-Sensors
TL;DR: This paper aims to give a comprehensive review of different algorithms for energy-based single and multiple source localization problems, their merits and demerits and to point out possible future research directions.
Abstract: Energy-based source localization is an important problem in wireless sensor networks (WSNs), which has been studied actively in the literature. Numerous localization algorithms, e.g., maximum likelihood estimation (MLE) and nonlinear-least-squares (NLS) methods, have been reported. In the literature, there are relevant review papers for localization in WSNs, e.g., for distance-based localization. However, not much work related to energy-based source localization is covered in the existing review papers. Energy-based methods are proposed and specially designed for a WSN due to its limited sensor capabilities. This paper aims to give a comprehensive review of these different algorithms for energy-based single and multiple source localization problems, their merits and demerits and to point out possible future research directions.

Journal ArticleDOI
TL;DR: In this article, sound propagation through tube arrays as a function of the incident sound direction and the surroundings temperature was studied, and it was shown that the most important factors influencing sound propagation are incidence angle and the surrounding temperature.


Journal ArticleDOI
TL;DR: The “adaptive perceptual bias” theory was extrapolated and its assumptions investigated by measuring sound source localization in response to acoustic stimuli presented in azimuth to imply looming, stationary, and receding motion in depth, and the results did not support the second hypothesis.
Abstract: Continuous increases of acoustic intensity (up-ramps) can indicate a looming (approaching) sound source in the environment, whereas continuous decreases of intensity (down-ramps) can indicate a receding sound source. From psychoacoustic experiments, an "adaptive perceptual bias" for up-ramp looming tonal stimuli has been proposed (Neuhoff, 1998). This theory postulates that (1) up-ramps are perceptually salient because of their association with looming and potentially threatening stimuli in the environment; (2) tonal stimuli are perceptually salient because of an association with single and potentially threatening biological sound sources in the environment, relative to white noise, which is more likely to arise from dispersed signals and nonthreatening/nonbiological sources (wind/ocean). In the present study, we extrapolated the "adaptive perceptual bias" theory and investigated its assumptions by measuring sound source localization in response to acoustic stimuli presented in azimuth to imply looming, stationary, and receding motion in depth. Participants (N = 26) heard three directions of intensity change (up-ramps, down-ramps, and steady state, associated with looming, receding, and stationary motion, respectively) and three levels of acoustic spectrum (a 1-kHz pure tone, the tonal vowel /ә/, and white noise) in a within-subjects design. We first hypothesized that if up-ramps are "perceptually salient" and capable of eliciting adaptive responses, then they would be localized faster and more accurately than down-ramps. This hypothesis was supported. However, the results did not support the second hypothesis. Rather, the white-noise and vowel conditions were localized faster and more accurately than the pure-tone conditions. These results are discussed in the context of auditory and visual theories of motion perception, auditory attentional capture, and the spectral causes of spatial ambiguity.

Journal ArticleDOI
TL;DR: In this article, an approach to detect a sound source and estimate the radial velocity and distance from the receiver, based on repeat Fourier transformation of the interference pattern formed during motion, is described.
Abstract: The paper describes an approach to detecting a sound source and estimating the radial velocity and distance from the receiver, based on repeat Fourier transformation of the interference pattern formed during motion. The obtained spectrogram contains localized domains of the spectral density of single modes. We estimate the localization domain and spectral density distribution and discuss the resolution of moving sound sources. We present the results of a field experiment and consider the interference immunity of the approach for localizing a source using a single receiver.

Journal ArticleDOI
TL;DR: Due to the use of retarded time approach, the proposed inverse technique avoids the interpolation of measured pressure that is needed in the time-domain rotating beamforming, thus providing the ability of real-time calculation of source strengths.

Journal ArticleDOI
TL;DR: In this article, the authors provide theoretical details and experimental validation results to the approach proposed by Minotti et al. for measuring amplitudes and phases of acoustic velocity components (AVC) that are waveform parameters of each component of velocity induced by an acoustic wave, in fully turbulent duct flows carrying multitone acoustic waves.
Abstract: The present study provides theoretical details and experimental validation results to the approach proposed by Minotti et al. (Aerosp Sci Technol 12(5):398–407, 2008) for measuring amplitudes and phases of acoustic velocity components (AVC) that are waveform parameters of each component of velocity induced by an acoustic wave, in fully turbulent duct flows carrying multi-tone acoustic waves. Theoretical results support that the turbulence rejection method proposed, based on the estimation of cross power spectra between velocity measurements and a reference signal such as a wall pressure measurement, provides asymptotically efficient estimators with respect to the number of samples. Furthermore, it is shown that the estimator uncertainties can be simply estimated, accounting for the characteristics of the measured flow turbulence spectra. Two laser-based measurement campaigns were conducted in order to validate the acoustic velocity estimation approach and the uncertainty estimates derived. While in previous studies estimates were obtained using laser Doppler velocimetry (LDV), it is demonstrated that high-repetition rate particle image velocimetry (PIV) can also be successfully employed. The two measurement techniques provide very similar acoustic velocity amplitude and phase estimates for the cases investigated, that are of practical interest for acoustic liner studies. In a broader sense, this approach may be beneficial for non-intrusive sound emission studies in wind tunnel testings.

Journal ArticleDOI
TL;DR: In this paper, a spherical microphone array with polyhedral discretization is compared with a spherical array with a slightly different geometry, and two criteria are introduced to improve the noise source map.

Journal ArticleDOI
TL;DR: A new approach which requires only four low sampling rate sensors to localize acoustic source in an anisotropic plate is proposed and improves the accuracy of localization prediction.