scispace - formally typeset
Search or ask a question

Showing papers on "Acoustic source localization published in 2015"


Journal ArticleDOI
TL;DR: This paper attempts to provide a state-of-the-art of sound source localization in Robotics in a context raises original constraints--e.g. embeddability, real time, broad- band environments, noise and reverberation--which are seldom simultaneously taken into account in Acoustics or Signal Processing.

135 citations


Journal ArticleDOI
TL;DR: Cochlear implantation for SSD can offer improved speech understanding in complex listening environments and improved sound source localization in both children and adults.
Abstract: Objective To assess improvements in sound source localization and speech understanding in complex listening environments after unilateral cochlear implantation for single-sided deafness (SSD). Study design Nonrandomized, open, prospective case series. Setting Tertiary referral center. Patients Nine subjects with a unilateral cochlear implant (CI) for SSD (SSD-CI) were tested. Reference groups for the task of sound source localization included young (n = 45) and older (n = 12) normal-hearing (NH) subjects and 27 bilateral CI (BCI) subjects. Intervention Unilateral cochlear implantation. Main outcome measures Sound source localization was tested with 13 loudspeakers in a 180 arc in front of the subject. Speech understanding was tested with the subject seated in an 8-loudspeaker sound system arrayed in a 360-degree pattern. Directionally appropriate noise, originally recorded in a restaurant, was played from each loudspeaker. Speech understanding in noise was tested using the Azbio sentence test and sound source localization quantified using root mean square error. Results All CI subjects showed poorer-than-normal sound source localization. SSD-CI subjects showed a bimodal distribution of scores: six subjects had scores near the mean of those obtained by BCI subjects, whereas three had scores just outside the 95th percentile of NH listeners. Speech understanding improved significantly in the restaurant environment when the signal was presented to the side of the CI. Conclusion Cochlear implantation for SSD can offer improved speech understanding in complex listening environments and improved sound source localization in both children and adults. On tasks of sound source localization, SSD-CI patients typically perform as well as BCI patients and, in some cases, achieve scores at the upper boundary of normal performance.

80 citations


Journal ArticleDOI
TL;DR: A grid-based method to estimate the location of multiple sources in a wireless acoustic sensor network, where each sensor node contains a microphone array and only transmits direction-of-arrival (DOA) estimates in each time interval, reducing the transmissions to the central processing node.

80 citations


Journal ArticleDOI
TL;DR: Results of virtual localization tests indicate that accurate localization performance is retained with spherical harmonic representations as low as fourth-order, and several important physical HRTF cues are shown to be present even in a first-order representation.
Abstract: Several methods have recently been proposed for modeling spatially continuous head-related transfer functions (HRTFs) using techniques based on finite-order spherical harmonic expansion. These techniques inherently impart some amount of spatial smoothing to the measured HRTFs. However, the effect this spatial smoothing has on the localization accuracy has not been analyzed. Consequently, the relationship between the order of a spherical harmonic representation for HRTFs and the maximum localization ability that can be achieved with that representation remains unknown. The present study investigates the effect that spatial smoothing has on virtual sound source localization by systematically reducing the order of a spherical-harmonic-based HRTF representation. Results of virtual localization tests indicate that accurate localization performance is retained with spherical harmonic representations as low as fourth-order, and several important physical HRTF cues are shown to be present even in a first-order representation. These results suggest that listeners do not rely on the fine details in an HRTF's spatial structure and imply that some of the theoretically-derived bounds for HRTF sampling may be exceeding perceptual requirements.

63 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used filtered noise bands to constrain listeners' access to interaural level differences (ILDs) and inter-aural time differences (ITDs) in a sound source localization task.
Abstract: In this report, we used filtered noise bands to constrain listeners' access to interaural level differences (ILDs) and interaural time differences (ITDs) in a sound source localization task. The samples of interest were listeners with single-sided deafness (SSD) who had been fit with a cochlear implant in the deafened ear (SSD-CI). The comparison samples included listeners with normal hearing and bimodal hearing, i.e., with a cochlear implant in 1 ear and low-frequency acoustic hearing in the other ear. The results indicated that (i) sound source localization was better in the SSD-CI condition than in the SSD condition, (ii) SSD-CI patients rely on ILD cues for sound source localization, (iii) SSD-CI patients show functional localization abilities within 1-3 months after device activation and (iv) SSD-CI patients show better sound source localization than bimodal CI patients but, on average, poorer localization than normal-hearing listeners. One SSD-CI patient showed a level of localization within normal limits. We provide an account for the relative localization abilities of the groups by reference to the differences in access to ILD cues.

60 citations


Journal ArticleDOI
TL;DR: A general formulation is presented for the optimum controller in an active system for local sound control in a spatially random primary field, where the sound field in a control region is selectively attenuated using secondary sources, driven by reference sensors, all of which are potentially remote from this control region.
Abstract: A general formulation is presented for the optimum controller in an active system for local sound control in a spatially random primary field. The sound field in a control region is selectively attenuated using secondary sources, driven by reference sensors, all of which are potentially remote from this control region. It is shown that the optimal controller is formed of the combination of a least-squares estimation of the primary source signals from the reference signals, and a least-squares controller driven by the primary source signals themselves. The optimum controller is also calculated using the remote microphone technique, in both the frequency and the time domains. The sound field under control is assumed to be stationary and generated by an array of primary sources, whose source strengths are specified using a spectral density matrix. This can easily be used to synthesize a diffuse primary field, if the primary sources are uncorrelated and far from the control region, but can also generate primary fields dominated by contributions from a particular direction, for example, which is shown to significantly affect the shape of the resulting zone of quiet.

59 citations


Journal ArticleDOI
TL;DR: Different approaches to increase the stability of the algorithms for source position estimation are discussed, which allows an increase in their efficiency in natural conditions.
Abstract: The state of the art of matched field hydroacoustic signal processing is described from the viewpoint of estimating the signal parameters in adaptive antenna arrays The focus is on methods for solving the problem of source localization in an oceanic waveguide under mismatching effects of different nature, caused by disagreement between the received acoustic field and its model Different approaches to increase the stability of the algorithms for source position estimation are discussed, which allows an increase in their efficiency in natural conditions

56 citations


Posted Content
TL;DR: It is shown that the high-dimensional acoustic samples indeed lie on a low-dimensional manifold and can be embedded into aLow-dimensional space and a semi-supervised source localization algorithm based on two-microphone measurements is proposed, which recovers the inverse mapping between the acoustic samples and their corresponding locations.
Abstract: Conventional speaker localization algorithms, based merely on the received microphone signals, are often sensitive to adverse conditions, such as: high reverberation or low signal to noise ratio (SNR). In some scenarios, e.g. in meeting rooms or cars, it can be assumed that the source position is confined to a predefined area, and the acoustic parameters of the environment are approximately fixed. Such scenarios give rise to the assumption that the acoustic samples from the region of interest have a distinct geometrical structure. In this paper, we show that the high dimensional acoustic samples indeed lie on a low dimensional manifold and can be embedded into a low dimensional space. Motivated by this result, we propose a semi-supervised source localization algorithm which recovers the inverse mapping between the acoustic samples and their corresponding locations. The idea is to use an optimization framework based on manifold regularization, that involves smoothness constraints of possible solutions with respect to the manifold. The proposed algorithm, termed Manifold Regularization for Localization (MRL), is implemented in an adaptive manner. The initialization is conducted with only few labelled samples attached with their respective source locations, and then the system is gradually adapted as new unlabelled samples (with unknown source locations) are received. Experimental results show superior localization performance when compared with a recently presented algorithm based on a manifold learning approach and with the generalized cross-correlation (GCC) algorithm as a baseline.

56 citations


Journal ArticleDOI
TL;DR: A two-step hybrid technique is proposed in this paper for predicting acoustic source in anisotropic plates that always reduced the prediction error irrespective of whether the final prediction coincided with the actual source location or not.

55 citations


Journal ArticleDOI
TL;DR: In this paper, a Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements is proposed to localize multiple sources at different locations.
Abstract: This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.

52 citations


Journal ArticleDOI
TL;DR: An in situ wireless SHM system based on an acoustic emission (AE) technique that localization of acoustic sources which could emulate impact damage or audible cracks caused by different objects, such as tools, bird strikes, or strong hail are proposed.
Abstract: Structural health monitoring (SHM) is important for reducing the maintenance and operation cost of safety-critical components and systems in offshore wind turbines. This paper proposes an in situ wireless SHM system based on an acoustic emission (AE) technique. By using this technique a number of challenges are introduced due to high sampling rate requirements, limitations in the communication bandwidth, memory space, and power resources. To overcome these challenges, this paper focused on two elements: (1) the use of an in situ wireless SHM technique in conjunction with the utilization of low sampling rates; (2) localization of acoustic sources which could emulate impact damage or audible cracks caused by different objects, such as tools, bird strikes, or strong hail, all of which represent abrupt AE events and could affect the structural health of a monitored wind turbine blade. The localization process is performed using features extracted from aliased AE signals based on a developed constraint localization model. To validate the performance of these elements, the proposed system was tested by testing the localization of the emulated AE sources acquired in the field.

Journal ArticleDOI
TL;DR: The deconvolution problem is solved with a fast gradient projection method called Fast Iterative Shrikage-Thresholding Algorithm (FISTA), and compared with a Fourier-based non-negative least squares algorithm, which indicates that FISTA tends to provide an improved spatial resolution and is up to 30% faster and more robust to noise.
Abstract: The localization of sound sources with delay-and-sum (DAS) beamforming is limited by a poor spatial resolution—particularly at low frequencies. Various methods based on deconvolution are examined to improve the resolution of the beamforming map, which can be modeled by a convolution of the unknown acoustic source distribution and the beamformer's response to a point source, i.e., point-spread function. A significant limitation of deconvolution is, however, an additional computational effort compared to beamforming. In this paper, computationally efficient deconvolution algorithms are examined with computer simulations and experimental data. Specifically, the deconvolution problem is solved with a fast gradient projection method called Fast Iterative Shrikage-Thresholding Algorithm (FISTA), and compared with a Fourier-based non-negative least squares algorithm. The results indicate that FISTA tends to provide an improved spatial resolution and is up to 30% faster and more robust to noise. In the spirit of reproducible research, the source code is available online.

Journal ArticleDOI
TL;DR: Experiments on real data validate the localization algorithm in an everyday scenario, proving that good accuracy can be obtained while saving computational cost in comparison with state-of-the-art techniques.
Abstract: In this paper, we propose a robust and low-complexity acoustic source localization technique based on time differences of arrival (TDOA), which addresses the scenario of distributed sensor networks in 3D environments. Network nodes are assumed to be unsynchronized, i.e., TDOAs between microphones belonging to different nodes are not available. We begin with showing how to select feasible TDOAs for each sensor node, exploiting both geometrical considerations and a characterization of the overall generalized cross correlation (GCC) shape. We then show how to localize sources in the space-range reference frame, where TDOA measurements have a clear geometrical interpretation that can be fruitfully used in the scenario of unsynchronized sensors. In this framework, in fact, the source corresponds to the apex of a hypercone passing through points described by the sole microphone positions and TDOA measurements. The localization problem is therefore approached as a hypercone fitting problem. Finally, in order to improve the robustness of the estimate, we include an outlier detection procedure based on the evaluation of the hypercone fitting residuals. A refinement of source location estimate is then performed ignoring the contributions coming from outlier measurements. A set of simulations shows the performance of individual blocks of the system, with particular focus on the effect of TDOA selection on source localization and refinement steps. Experiments on real data validate the localization algorithm in an everyday scenario, proving that good accuracy can be obtained while saving computational cost in comparison with state-of-the-art techniques.

Proceedings ArticleDOI
09 Nov 2015
TL;DR: Speaker directional information, obtained using sound source localization from a microphone array is used to supervise the training of these video features that aim to capture other cues: movement of the head, upper body and hands of active speakers.
Abstract: Active speakers have traditionally been identified in video by detecting their moving lips. This paper demonstrates the same using spatio-temporal features that aim to capture other cues: movement of the head, upper body and hands of active speakers. Speaker directional information, obtained using sound source localization from a microphone array is used to supervise the training of these video features.

Proceedings ArticleDOI
28 Dec 2015
TL;DR: This work proposes a novel method for 3D direction of arrival (DOA) estimation based on the sound intensity vector estimation, via the encoding of the signals of a spherical microphone array from the space domain to the spherical harmonic domain.
Abstract: This work proposes a novel method for 3D direction of arrival (DOA) estimation based on the sound intensity vector estimation, via the encoding of the signals of a spherical microphone array from the space domain to the spherical harmonic domain. The sound intensity vector is estimated on detected single source zones (SSZs), where one source is dominant. A smoothed 2D histogram of these estimates reveals the DOA of the present sources and through an iterative process, accurate 3D DOA information can be obtained. The performance of the proposed method is demonstrated through simulations in various signal-to-noise ratio and reverberation conditions.

Journal ArticleDOI
TL;DR: This paper provides a theoretical basis for the generation of two zones based on the creation of sound fields with nulls and the positioning of those nulls at arbitrary positions.
Abstract: An application of current interest in sound reproduction systems is the creation of multizone sound fields which produce multiple independent sound fields for multiple listeners. The challenge in producing such sound fields is the avoidance of interference between sound zones, which is dependent on the geometry of the zone and the direction of arrival of the desired sound fields. This paper provides a theoretical basis for the generation of two zones based on the creation of sound fields with nulls and the positioning of those nulls at arbitrary positions. The nulls are created by suppressing low-order mode terms in the sound field expansion. Simulations are presented for the two-dimensional case which shows that suppression of interference is possible across a broad frequency audio range.

Journal ArticleDOI
TL;DR: Frequency difference MFP unambiguously localized the source in several experimental data sets with average peak-to-side-lobe ratio of 0.9 dB and can be more robust against environmental mismatch than conventional MFP.
Abstract: Matched field processing (MFP) is an established technique for locating remote acoustic sources in known environments. Unfortunately, environment-to-propagation model mismatch prevents successful application of MFP in many circumstances, especially those involving high frequency signals. For beamforming applications, this problem was found to be mitigated through the use of a nonlinear array-signal-processing technique called frequency difference beamforming (Abadi et. al. 2012). Building on that work, this nonlinear technique was extended to Bartlett MFP, where ambiguity surfaces were calculated at frequencies two orders of magnitude lower than the propagated signal, where the detrimental effects of environmental mismatch are much reduced. Previous work determined that this technique has the ability to localize high-frequency broadband sources in a shallow ocean environment with a sparse vertical array, using both simulated and experimental propagation data. Using simulations, the performance of this technique with horizontal arrays and adaptive signal processing techniques was investigated. Results for signals with frequencies from 10 kHz to 30 kHz that propagated in a 100-m-deep shallow ocean sound channel with a downward refracting sound speed profile will be shown for source array ranges of one to several kilometers. [Sponsored by the Office of Naval Research.]

Journal ArticleDOI
TL;DR: In this paper, an approach to determine sound velocity in air by using standard drain pipes is described. And an investigation of the temperature dependency of the speed of sound is presented. But the authors do not consider the effect of temperature on sound velocities.
Abstract: The opportunity to plot oscillograms and frequency spectra with smartphones creates many options for experiments in acoustics, including several that have been described in this column.1–3 The activities presented in this paper are intended to complement these applications, and include an approach to determine sound velocity in air by using standard drain pipes4 and an outline of an investigation of the temperature dependency of the speed of sound.

Journal ArticleDOI
TL;DR: In this article, the authors have developed a straightforward formula for calculation of pressure field, which is consistent with experimental data in far field and solved full 3D governing equations using numerical methods.
Abstract: Carbon nanotube webs exhibit interesting properties when used as thermo-acoustic projectors. This work studies thermo-acoustic effect of these sound sources both in near and far field regions. Based on two alternative forms of the energy equation, we have developed a straightforward formula for calculation of pressure field, which is consistent with experimental data in far field. Also we have solved full 3-D governing equations using numerical methods. Our three-dimensional simulation and experimental data show pressure waves are highly affected by dimensions of sound sources in near field due to interference effects. However, generation of sound waves in far field is independent of projectors area surface. Energy analysis for free standing Thermo-Acoustic (TA) sound sources show that aerogel TA sound sources like CNT based projectors could act more efficiently compared to the other sources in delivering more than 75% of alternative input energy to the medium gas up to a frequency of 1 MHz.

Journal ArticleDOI
TL;DR: This work proposes a new method for interactive and continuous editing as well as exploration of modal sound parameters, and develops a compact, low-memory representation of frequency-varying acoustic transfer values at each key point using Prony series.
Abstract: Current linear modal sound models are tightly coupled with their frequency content. Both the modal vibration of object surfaces and the resulting sound radiation depend on the vibration frequency. Whenever the user tweaks modal parameters to adjust frequencies the modal sound model changes completely, necessitating expensive recomputation of modal vibration and sound radiation.We propose a new method for interactive and continuous editing as well as exploration of modal sound parameters. We start by sampling a number of key points around a vibrating object, and then devise a compact, low-memory representation of frequency-varying acoustic transfer values at each key point using Prony series. We efficiently precompute these series using an adaptive frequency sweeping algorithm and volume-velocity-preserving mesh simplification. At runtime, we approximate acoustic transfer values using standard multipole expansions. Given user-specified modal frequencies, we solve a small least-squares system to estimate the expansion coefficients, and thereby quickly compute the resulting sound pressure value at arbitrary listening locations. We demonstrate the numerical accuracy, the runtime performance of our method on a set of comparisons and examples, and evaluate sound quality with user perception studies.

Journal ArticleDOI
TL;DR: In this article, an intensity probe, consisting of four microphones, captured the radiated field to the sideline and aft of a tethered, full-scale military jet aircraft as one engine was operated at multiple engine conditions.
Abstract: Vector acoustic intensity provides both the direction and magnitude of energy flow at the probe location and is, hence, more informative than acoustic pressure measurements. However, this important quantity has seen little application previously in aeroacoustics. In the present work, an intensity probe, consisting of four microphones, captured the radiated field to the sideline and aft of a tethered, full-scale military jet aircraft as one engine was operated at multiple engine conditions. Data from each probe location provide a frequency-dependent map of the sound flow near the aircraft. The vector acoustic intensity is estimated using a recently developed processing technique that extends the upper frequency limit of the traditional cross-spectrum-based calculations. The dominant intensity vectors are traced back to the jet centerline as a method of approximating the extent and location of the source region as a function of frequency. As expected for jet mixing noise sources, the resulting source region estimates contract and move upstream with increasing frequency. A comparison of estimated source regions and intensity directionalities between military and afterburner engine conditions reveals important distinctions in the sound fields.

Journal ArticleDOI
TL;DR: The mechanisms of the planar virtual sound barrier are investigated and it is found that three mechanisms work together in the system, including changing the impedance of the primary source, modal control, and modal rearrangement.
Abstract: This paper proposes to reduce the radiation of a sound source inside a cavity through the baffled opening by using an array of loudspeakers and microphones. The system is called a planar virtual sound barrier because it acts like a concrete sound barrier to block the transmission of sound but does not affect light and air circulation. An analytical model for the planar virtual sound barrier is developed based on the modal superposition method to calculate the sound field in and outside a rectangular cavity with a baffled opening. After the model is verified with numerical simulations, a performance study of the planar virtual sound barrier is carried out based on the proposed analytical model, and then the results are confirmed by experiments. The mechanisms of the planar virtual sound barrier are investigated and it is found that three mechanisms work together in the system, including changing the impedance of the primary source, modal control, and modal rearrangement. It is also found that there exist some frequencies where the sound cannot be controlled if all the secondary sources are on the same plane parallel to the opening, and the reasons behind the phenomenon are explained.

Journal ArticleDOI
27 Aug 2015-PLOS ONE
TL;DR: Data reinforce previous reports that HPDs significantly compromise a variety of auditory perceptual facilities, particularly sound localization due to distortions of high-frequency spectral cues that are important for the avoidance of front-back confusions.
Abstract: Hearing protection devices (HPDs) such as earplugs offer to mitigate noise exposure and reduce the incidence of hearing loss among persons frequently exposed to intense sound. However, distortions of spatial acoustic information and reduced audibility of low-intensity sounds caused by many existing HPDs can make their use untenable in high-risk (e.g., military or law enforcement) environments where auditory situational awareness is imperative. Here we assessed (1) sound source localization accuracy using a head-turning paradigm, (2) speech-in-noise recognition using a modified version of the QuickSIN test, and (3) tone detection thresholds using a two-alternative forced-choice task. Subjects were 10 young normal-hearing males. Four different HPDs were tested (two active, two passive), including two new and previously untested devices. Relative to unoccluded (control) performance, all tested HPDs significantly degraded performance across tasks, although one active HPD slightly improved high-frequency tone detection thresholds and did not degrade speech recognition. Behavioral data were examined with respect to head-related transfer functions measured using a binaural manikin with and without tested HPDs in place. Data reinforce previous reports that HPDs significantly compromise a variety of auditory perceptual facilities, particularly sound localization due to distortions of high-frequency spectral cues that are important for the avoidance of front-back confusions.

Journal ArticleDOI
TL;DR: A numerical and experimental investigation of the acoustic streaming flow in the near field of a circular plane ultrasonic transducer in water is performed, validating the numerical approach and justifying the planar wave assumption in conditions where it is a priori far from obvious.
Abstract: A numerical and experimental investigation of the acoustic streaming flow in the near field of a circular plane ultrasonic transducer in water is performed. The experimental domain is a parallelepipedic cavity delimited by absorbing walls to avoid acoustic reflection, with a top free surface. The flow velocities are measured by particle image velocimetry, leading to well-resolved velocity profiles. The theoretical model is based on a linear acoustic propagation model, which correctly reproduces the acoustic field mapped experimentally using a hydrophone, and an acoustic force term introduced in the Navier-Stokes equations under the plane-wave assumption. Despite the complexity of the acoustic field in the near field, in particular in the vicinity of the acoustic source, a good agreement between the experimental measurements and the numerical results for the velocity field is obtained, validating our numerical approach and justifying the planar wave assumption in conditions where it is a priori far from obvious. The flow structure is found to be correlated with the acoustic field shape. Indeed, the longitudinal profiles of the velocity present a wavering linked to the variations in acoustic intensity along the beam axis and transverse profiles exhibit a complex shape strongly influenced by the transverse variations of the acoustic intensity in the beam. Finally, the velocity in the jet is found to increase as the square root of the acoustic force times the distance from the origin of the jet over a major part of the cavity, after a strong short initial increase, where the velocity scales with the square of the distance from the upstream wall.

Journal ArticleDOI
TL;DR: The performance of an acoustic source localization system using distributed microphones is analyzed over a massive multichannel processing framework in a multi-GPU system to confirm the advantages of suitable GPU architectures in the development of real-time massive acoustic signal processing systems.
Abstract: Expert system for passive sound source localization that makes use of multiple GPUs.Fine spatial grids and a high number of microphones provide excellent localization.GPU resources for managing a large expert system are described.A complete set of simulations evaluates the performance of the expert system.Excellent localization accuracy is achieved even in adverse environments. Sound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human-machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. Graphics Processing Units (GPUs) are highly parallel programmable co-processors that provide massive computation when the needed operations are properly parallelized. Emerging GPUs offer multiple parallelism levels; however, properly managing their computational resources becomes a very challenging task. In fact, management issues become even more difficult when multiple GPUs are involved, adding one more level of parallelism. In this paper, the performance of an acoustic source localization system using distributed microphones is analyzed over a massive multichannel processing framework in a multi-GPU system. The paper evaluates and points out the influence that the number of microphones and the available computational resources have in the overall system performance. Several acoustic environments are considered to show the impact that noise and reverberation have in the localization accuracy and how the use of massive microphone systems combined with parallelized GPU algorithms can help to mitigate substantially adverse acoustic effects. In this context, the proposed implementation is able to work in real time with high-resolution spatial grids and using up to 48 microphones. These results confirm the advantages of suitable GPU architectures in the development of real-time massive acoustic signal processing systems.

Journal ArticleDOI
TL;DR: The experiments show that sound rotation perception when sources and listeners rotate was based on acoustic, visual, and, perhaps, vestibular information and suggest that sound source localization is not based just on acoustics, but is a multisystem process.
Abstract: In four experiments listeners were rotated or were stationary. Sounds came from a stationary loudspeaker or rotated from loudspeaker to loudspeaker around an azimuth array. When either sounds or listeners rotate the auditory cues used for sound source localization change, but in the everyday world listeners perceive sound rotation only when sounds rotate not when listeners rotate. In the everyday world sound source locations are referenced to positions in the environment (a world-centric reference system). The auditory cues for sound source location indicate locations relative to the head (a head-centric reference system), not locations relative to the world. This paper deals with a general hypothesis that the world-centric location of sound sources requires the auditory system to have information about auditory cues used for sound source location and cues about head position. The use of visual and vestibular information in determining rotating head position in sound rotation perception was investigated. The experiments show that sound rotation perception when sources and listeners rotate was based on acoustic, visual, and, perhaps, vestibular information. The findings are consistent with the general hypotheses and suggest that sound source localization is not based just on acoustics. It is a multisystem process.

Journal ArticleDOI
TL;DR: In this paper, a refined volumetric SRP (RV-SRP) was proposed to reduce the computational complexity without sacrificing the accuracy of location estimates, and a refinement step improved on the compromise between complexity and accuracy.
Abstract: This letter proposes an efficient method based on the steered-response power (SRP) technique for sound source localization using microphone arrays: the refined volumetric SRP (RV-SRP). By deploying a sparser volumetric grid, the RV-SRP achieves a significant reduction of the computational complexity without sacrificing the accuracy of location estimates. In addition, a refinement step improves on the compromise between complexity and accuracy. Experiments conducted in both simulated- and real-data scenarios show that the RV-SRP outperforms state-of-the-art methods in accuracy with lower computational cost.

Journal ArticleDOI
TL;DR: It is found that the adaptation of the N1 amplitude was location-specific across localization cues, which can be explained by the existence of auditory cortical neurons that are sensitive to sound source location independent on which cue, ITD or ILD, provides the location information.

Journal ArticleDOI
TL;DR: An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with binaural robots equipped with two microphones inside artificial pinnae for localization over the entire azimuth.
Abstract: An improved sound source localization (SSL) method has been developed that is based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for use with binaural robots equipped with two microphones inside artificial pinnae. The conventional SSL method based on the GCC-PHAT method has two main problems when used on a binaural robot platform: 1) diffraction of sound waves with multipath interference caused by the contours of the robot head, which affects localization accuracy, and 2) front-back ambiguity, which limits the localization range to half the horizontal space. The diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae for localization over the entire azimuth. Experiments conducted using two dummy heads equipped with small or large pinnae showed that localization errors were reduced by 8.91° (3.21° vs. 12.12°) on average with the new time delay factor compared with the conventional GCC-PHAT method and that the success rate for front-back disambiguation using the pinnae amplification effect was 29.76 % (93.46 % vs. 72.02 %) better on average over the entire azimuth than with a conventional head related transfer function (HRTF)-based method.

Proceedings ArticleDOI
17 Dec 2015
TL;DR: Simulation and experiments on a mobile robot suggest that the proposed technique improves TDOA discrimination and brings the additional benefit of modulating the computing load requirement according to voice activity.
Abstract: Localization of sound sources in adverse environments is an important challenge in robot audition. The target sound source is often corrupted by coherent broadband noise, which introduces localization ambiguities as noise is often mistaken as the target source. To discriminate the time difference of arrival (TDOA) parameters of the target source and noise, this paper presents a binary mask for weighted generalized cross-correlation with phase transform (GCC-PHAT). Simulation and experiments on a mobile robot suggest that the proposed technique improves TDOA discrimination. It also brings the additional benefit of modulating the computing load requirement according to voice activity.