scispace - formally typeset
Search or ask a question

Showing papers on "Acoustic source localization published in 2006"


Book
01 Jan 2006
TL;DR: Theory and applications of Acoustic MIMO systems, Wiener Filter and Basic Adaptive Algorithms, and Frequency-Domain Adaptive Filters are studied.
Abstract: Theory.- Acoustic MIMO Systems.- Wiener Filter and Basic Adaptive Algorithms.- Sparse Adaptive Filters.- Frequency-Domain Adaptive Filters.- Blind Identification of Acoustic MIMO Systems.- Separation and Suppression of Co-Channel and Temporal Interference.- Applications.- Acoustic Echo Cancellation and Audio Bridging.- Time Delay Estimation and Acoustic Source Localization.- Speech Enhancement and Noise Reduction.- Source Separation and Speech Dereverberation.

192 citations


Proceedings ArticleDOI
31 Oct 2006
TL;DR: The design, implementation, and evaluation of the Acoustic Embedded Networked Sensing Box (ENSBox), a platform for prototyping rapid-deployable distributed acoustic sensing systems, particularly distributed source localization, are presented.
Abstract: We present the design, implementation, and evaluation of the Acoustic Embedded Networked Sensing Box (ENSBox), a platform for prototyping rapid-deployable distributed acoustic sensing systems, particularly distributed source localization. Each ENSBox integrates an ARM processor running Linux and supports key facilities required for source localization: a sensor array, wireless network services, time synchronization, and precise self-calibration of array position and orientation. The ENSBox's integrated, high precision self-calibration facility sets it apart from other platforms. This self-calibration is precise enough to support acoustic source localization applications in complex, realistic environments: e.g., 5 cm average 2D position error and 1.5 degree average orientation error over a partially obstructed 80x50 m outdoor area. Further, our integration of array orientation into the position estimation algorithm is a novel extension of traditional multilateration techniques. We present the result of several different test deployments, measuring the performance of the system in urban settings, as well as forested, hilly environments with obstructing foliage and 20-30 m distances between neighboring nodes.

182 citations


Patent
11 Aug 2006
TL;DR: In this paper, a beamforming processing for attenuating sound source signals arriving from directions symmetrical with respect to a perpendicular line to a straight line connecting two microphones 10 and 11 respectively by multiplying output signals from the microphones10 and 11 after spectrum analysis by weighted coefficients which are complex conjugate to each other.
Abstract: A sound source signal from a target sound source is allowed to be separated from a mixed sound which consists of sound source signals emitted from a plurality of sound sources without being affected by uneven sensitivity of microphone elements. A beamformer section 3 of a source separation device 1 performs beamforming processing for attenuating sound source signals arriving from directions symmetrical with respect to a perpendicular line to a straight line connecting two microphones 10 and 11 respectively by multiplying output signals from the microphones 10 and 11 after spectrum analysis by weighted coefficients which are complex conjugate to each other. Power computation sections 40 and 41 compute power spectrum information, and target sound spectrum extraction sections 50 and 51 extract spectrum information of a target sound source based on a difference between the power spectrum information.

110 citations


Patent
23 Feb 2006
TL;DR: In this paper, an array speaker consists of an array of speakers arranged in a single body, a sound source localization adding unit which generates left and right audio signals by performing localization processing for adding sound characteristics to audio signals of a front-left channel and a frontright channel on the basis of head transfer functions, and a sound emitting direction control unit which distributes the left andright audio signals to one or plural speaker units of the array speaker, and controls timing with which the speaker units output the audio signals so that a left sound emitted from the array speakers forms the same sound
Abstract: An array speaker apparatus includes an array speaker in which plural speaker units are arranged in a single body, a sound source localization adding unit which generates left and right audio signals by performing localization processing for adding sound characteristics to audio signals of a front-left channel and a front-right channel on the basis of head transfer functions, and a sound emitting direction control unit which distributes the left and right audio signals to one or plural speaker units of the array speaker, and controls timing with which the speaker units output the audio signals so that a left sound emitted from the array speaker forms the same sound wavefront formed by a sound emitted from one of virtual point sound sources and that a right sound emitted from the array speaker forms the same sound wavefront formed by a sound emitted from the other of the virtual point sound sources.

106 citations


Journal ArticleDOI
01 Oct 2006
TL;DR: A biologically inspired and technically implemented sound localization system to robustly estimate the position of a sound source in the frontal azimuthal half-plane that is able to localize audible signals, for example human speech signals, even in reverberating environments.
Abstract: This paper proposes a biologically inspired and technically implemented sound localization system to robustly estimate the position of a sound source in the frontal azimuthal half-plane. For localization, binaural cues are extracted using cochleagrams generated by a cochlear model that serve as input to the system. The basic idea of the model is to separately measure interaural time differences and interaural level differences for a number of frequencies and process these measurements as a whole. This leads to two-dimensional frequency versus time-delay representations of binaural cues, so-called activity maps. A probabilistic evaluation is presented to estimate the position of a sound source over time based on these activity maps. Learned reference maps for different azimuthal positions are integrated into the computation to gain time-dependent discrete conditional probabilities. At every timestep these probabilities are combined over frequencies and binaural cues to estimate the sound source position. In addition, they are propagated over time to improve position estimation. This leads to a system that is able to localize audible signals, for example human speech signals, even in reverberating environments

102 citations


Journal ArticleDOI
TL;DR: In this article, the authors use a method which is based on the combined measurement of the instantaneous sound pressure and sound particle velocity, and a detailed analysis of the influence of the calibration, the source type, source height, the sound incidence angle, and the sample size are included.
Abstract: Acoustic surface impedance of sound absorbing materials can be measured by several techniques such as the impedance tube for normal impedance or the Tamura method for normal and oblique surface impedance. In situ, the acoustic impedance is mostly measured by use of impulse methods or by applying two-microphone techniques. All these techniques are based on the determination of the sound pressure at specific locations. In this paper, the authors use a method which is based on the combined measurement of the instantaneous sound pressure and sound particle velocity. A brief description of the measurement technique and a detailed analysis of the influence of the calibration, the source type, the source height, the sound incidence angle, and the sample size are included.

92 citations


Proceedings ArticleDOI
14 May 2006
TL;DR: A new robust sound source localization and tracking method using an array of eight microphones using a steered beamformer based on the reliability-weighted phase transform (RWPHAT) along with a particle filter-based tracking algorithm is presented.
Abstract: In this paper we present a new robust sound source localization and tracking method using an array of eight microphones (US patent pending). The method uses a steered beamformer based on the reliability-weighted phase transform (RWPHAT) along with a particle filter-based tracking algorithm. The proposed system is able to estimate both the direction and the distance of the sources. In a videoconferencing context, the direction was estimated with an accuracy better than one degree while the distance was accurate within 10% RMS. Tracking of up to three simultaneous moving speakers is demonstrated in a noisy environment.

91 citations


Journal ArticleDOI
TL;DR: The flow reversal theorem is used to show that the cross-correlation function of ambient noise provides an estimate of a combination of the Green's functions corresponding to sound propagation in opposite directions between the two receivers.
Abstract: We study long-range correlation of diffuse acoustic noise fields in an arbitrary inhomogeneous, moving fluid. The flow reversal theorem is used to show that the cross-correlation function of ambient noise provides an estimate of a combination of the Green's functions corresponding to sound propagation in opposite directions between the two receivers. Measurements of the noise cross correlation allow one to quantify flow-induced acoustic nonreciprocity and evaluate both spatially averaged flow velocity and sound speed between the two points.

91 citations


Journal ArticleDOI
TL;DR: The proposed algorithm, although relying on an iterative optimization scheme, proved efficient enough for real-time operation and provides source localization accuracy superior to the standard spherical and linear intersection techniques.
Abstract: In this work, we propose an algorithm for acoustic source localization based on time delay of arrival (TDOA) estimation. In earlier work by other authors, an initial closed-form approximation was first used to estimate the true position of the speaker followed by a Kalman filtering stage to smooth the time series of estimates. In the proposed algorithm, this closed-form approximation is eliminated by employing a Kalman filter to directly update the speaker's position estimate based on the observed TDOAs. In particular, the TDOAs comprise the observation associated with an extended Kalman filter whose state corresponds to the speaker's position. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that the proposed algorithm provides source localization accuracy superior to the standard spherical and linear intersection techniques. Moreover, the proposed algorithm, although relying on an iterative optimization scheme, proved efficient enough for real-time operation.

87 citations


Journal ArticleDOI
TL;DR: The results indicate that robust processing strategies are needed to exploit interaural parameters successfully in noise conditions due to their strong temporal fluctuations and that elevation discrimination is possible even at low SNRs in the median plane by integrating information across frequency.
Abstract: The role of temporal fluctuations and systematic variations of interaural parameters in localization of sound sources in spatially distributed, nonstationary noise conditions was investigated. For this, Bayesian estimation was applied to interaural parameters calculated with physiologically plausible time and frequency resolution. Probability density functions (PDFs) of the interaural level differences (ILDs) and phase differences (IPDs) were estimated by measuring histograms for a directional sound source perturbed by several types of interfering noise at signal-to-noise ratios (SNRs) between −5 and +30dB. A moment analysis of the PDFs reveals that the expected values shift and the standard deviations increase considerably with decreasing SNR, and that the PDFs have non-Gaussian shape at medium SNRs. A d′ analysis of the PDFs indicates that elevation discrimination is possible even at low SNRs in the median plane by integrating information across frequency. Absolute sound localization was simulated by a ...

75 citations


Proceedings ArticleDOI
01 Oct 2006
TL;DR: A robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones and developed genetic algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters.
Abstract: This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported missing feature theory (MFT) based integration of sound source separation (SSS) and automatic speech recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in voice activity detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed genetic algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment

Journal ArticleDOI
TL;DR: In this article, an instrument for highly accurate measurements of the speed of sound in fluids in the temperature range between 240 and 420 K with pressures up to 100 MPa is described.
Abstract: An instrument for highly accurate measurements of the speed of sound in fluids in the temperature range between 240 and 420 K with pressures up to 100 MPa is described. The measurement principle of the speed of sound sensor is based on a double path length pulse-echo technique. The achieved measurement uncertainties are 3 mK for the temperature, 0.01% for the pressure below 10 MPa and 0.005% for the pressure between 10 and 100 MPa, and 0.014% for the speed of sound. The high accuracy of the instrument is demonstrated by measurements in liquid water and compressed argon. The results for argon prove that our pulse-echo technique agrees with the highly accurate spherical resonator technique, which is commonly employed for speed of sound measurements in gases, in the pressure range where both methods overlap within our measurement uncertainty.

Journal ArticleDOI
TL;DR: A method is derived for instantaneous source-range estimation in a horizontally stratified ocean waveguide from passive beam-time intensity data obtained after conventional plane-wave beamforming of acoustic array measurements that has advantages over existing source localization methods, such as matched field processing or the waveguide invariant.
Abstract: A method is derived for instantaneous source-range estimation in a horizontally stratified ocean waveguide from passive beam-time intensity data obtained after conventional plane-wave beamforming of acoustic array measurements. The method has advantages over existing source localization methods, such as matched field processing or the waveguide invariant. First, no knowledge of the environment is required except that the received field should not be dominated by purely waterborne propagation. Second, range can be estimated in real time with little computational effort beyond plane-wave beamforming. Third, array gain is fully exploited. The method is applied to data from the Main Acoustic Clutter Experiment of 2003 for source ranges between 1to8km, where it is shown that simple, accurate, and computationally efficient source range estimates can be made.

Proceedings ArticleDOI
14 May 2006
TL;DR: The experimental results show that particle filter based integration reduces localization errors and provides accurate and robust 2D sound source tracking.
Abstract: Sound source tracking is an important function for a robot operating in a daily environment, because the robot should recognize where a sound event such as speech, music and other environmental sounds originates from. This paper addresses sound source tracking by integrating a room and a robot microphone array. The room microphone array consists of 64 microphones attached to the walls. It provides 2D (x-y) sound source localization based on a weighted delay-and-sum beamforming method. The robot microphone array consists of eight microphones installed on a robot head, and localizes multiple sound sources in azimuth. The localization results are integrated to track sound sources by using a particle filter for multiple sound sources. The experimental results show that particle filter based integration reduces localization errors and provides accurate and robust 2D sound source tracking.

Journal ArticleDOI
TL;DR: The design and performance of three architectures and corresponding protocols that use a variation of the Time-of-Flight method for localizing three different kinds of devices, namely 802.11-enabled PDAs, 3G cell phones, and PDAs without network connectivity are described.
Abstract: Sound source localization will play a major role in the new location-aware applications envisioned in Ubiquitous Computing. We describe the design and performance of three architectures and corresponding protocols that use a variation of the Time-of-Flight method for localizing three different kinds of devices, namely 802.11-enabled PDAs, 3G cell phones, and PDAs without network connectivity. The quantitative assessment is based on the deployment made with 6 sensors in, a 20x9m room, sewing over 10,000 localization requests. Our experiments indicate that all architectures achieve localization within 70cm of the actual position 90% of the time. The accuracy is further improved to 40cm 90% of the time when geometric factors are taken into consideration. The effects of noise and obstructions are also analyzed. Within 1m localization error realistic noise degrades the accuracy by 6 to 10%. The presence of obstacles, such as humans and cement columns, has no observable effect on the performance.

Journal ArticleDOI
TL;DR: Experimental results obtained with simulated reverberant samples and real audio recordings demonstrate that the new algorithm is more suitable for practical applications due to its reinitialisation capabilities, despite showing a slightly lower average tracking accuracy.
Abstract: Sequential Monte Carlo methods have been recently proposed to deal with the problem of acoustic source localisation and tracking using an array of microphones. Previous implementations make use of the basic bootstrap particle filter, whereas a more general approach involves the concept of importance sampling. In this paper, we develop a new particle filter for acoustic source localisation using importance sampling, and compare its tracking ability with that of a bootstrap algorithm proposed previously in the literature. Experimental results obtained with simulated reverberant samples and real audio recordings demonstrate that the new algorithm is more suitable for practical applications due to its reinitialisation capabilities, despite showing a slightly lower average tracking accuracy. A real-time implementation of the algorithm also shows that the proposed particle filter can reliably track a person talking in real reverberant rooms.

Patent
11 Aug 2006
TL;DR: In this paper, a beamformer is used to attenuate the sound source signals arriving from directions symmetrical with respect to the vertical line of the line connecting two microphones by spectrum-analyzing the output signals from the microphones and multiplying the signals after the spectrum analysis by the weighting factors complex conjugated with the signals.
Abstract: The sound source signal from a target sound source is separated from the mixed sound in which the sound source signals from sound sources are mixed without being influenced by the variations of the sensitivity of microphone elements. A beam former section (3) of the sound source separating device (1) performs beam formation to attenuate the sound source signals arriving from directions symmetrical with respect to the vertical line of the line connecting two microphones (10, 11) by spectrum-analyzing the output signals from the microphones (10, 11) and multiplying the signals after the spectrum analysis by the weighting factors complex conjugated with the signals. Power calculating sections (40, 41) calculate power spectrum information. Target sound spectrum extracting sections (50, 51) extract spectrum information on target sound sources according to the difference between the power spectrum information from one beam former and that from the other.

Proceedings ArticleDOI
01 Oct 2006
TL;DR: This work combines the localization evidence over a variety of robot poses using an evidence grid to produce a representation that localizes the pertinent objects well over time, can be used to filter poor localization results, and may be useful for global re-localization from sound localization results.
Abstract: Sound source localization on a mobile robot can be a difficult task due to a variety of problems inherent to a real environment, including robot ego-noise, echoes, and the transient nature of ambient noise. As a result, source localization data are often very noisy and unreliable. In this work, we overcome some of these problems by combining the localization evidence over a variety of robot poses using an evidence grid. The result is a representation that localizes the pertinent objects well over time, can be used to filter poor localization results, and may also be useful for global re-localization from sound localization results.

Journal ArticleDOI
TL;DR: In this paper, a transient acoustic holography method based on the Rayleigh integral and the time-reversal mirror principle is proposed to reconstruct the particle velocity of the surface of an acoustic source from the waveform of the signal measured over a surface lying in front of the source.
Abstract: A transient acoustic holography method based on the Rayleigh integral and the time-reversal mirror principle is described. The method reconstructs the particle velocity of the surface of an acoustic source from the waveform of the signal measured over a surface lying in front of the source. The possibility of applying the transient holography to studying pulsed sources used in ultrasonic diagnostics is investigated. A rectangular source that produces a short acoustic pulse and has a nonradiating defect on its surface is considered. A numerical simulation is used to demonstrate the possibility of a holographic reconstruction of the source vibrations. The effects of the spatial sampling step and the size of the measurement region on the reconstruction quality are demonstrated.

Proceedings ArticleDOI
01 Aug 2006
TL;DR: This work proposes a new binaural sound source localization technique based on using only two microphones placed inside the ear canal of a robot dummy head that is able to localize sound sources in free space with high precision and low computational complexity.
Abstract: For sound localization methods to be useful in real-time scenarios, the processing power requirements must be low enough to allow real time processing of audio inputs. We propose a new binaural sound source localization technique based on using only two microphones placed inside the ear canal of a robot dummy head. The head is equipped with artificial ears and is mounted on a torso. In contrast to existing 3D sound source localization methods using microphone arrays, our novel method employs two microphone and is based on a simple correlation approach using a generic set of Head Related Transfer Functions (HRTFs). The proposed method is demonstrated through simulation and is further tested in a household environment. This set up proves to be very noise-tolerant and is able to localize sound sources in free space with high precision and low computational complexity.

Journal ArticleDOI
TL;DR: In this paper, the effects of the Reynolds and Mach numbers on the sound generation and propagation characteristics were investigated by using a two-step aeroacoustic prediction method, in which the incompressible Navier-Stokes equations are solved numerically to predict the time-evolving acoustic field.

Proceedings ArticleDOI
01 Sep 2006
TL;DR: This work applies a recently presented TDOA estimation method based on blind adaptive multiple-input-multiple-output (MIMO) system identification to obtain the required set of TDOA estimates for the multidimensional localization of multiple sound sources and shows that the blind adaptive MIMO system identification allows a high spatial resolution.
Abstract: The TDOA-based acoustic source localization approach is a powerful and widely-used method which can be applied for one source in several dimensions or several sources in one dimension. However the localization turns out to be more challenging when multiple sound sources should be localized in multiple dimensions, due to a spatial ambiguity phenomenon which requires to perform an intermediate step after the TDOA estimation and before the calculation of the geometrical source positions. In order to obtain the required set of TDOA estimates for the multidimensional localization of multiple sound sources, we apply a recently presented TDOA estimation method based on blind adaptive multiple-input-multiple-output (MIMO) system identification. We demonstrate that this localization method also provides valuable side information which allows us to resolve the spatial ambiguity without any prior knowledge about the source positions. Furthermore we show that the blind adaptive MIMO system identification allows a high spatial resolution. Experimental results for the localization of two sources in a two-dimensional plane show the effectiveness of the proposed scheme

Proceedings ArticleDOI
01 Jan 2006
TL;DR: A new approach for binaural sound source localization in real world environments implementing a new model of the precedence effect enables the robust measurement of the localization cue values (ITD, UD and IED) in echoic environments.
Abstract: We propose a new approach for binaural sound source localization in real world environments implementing a new model of the precedence effect. This enables the robust measurement of the localization cue values (ITD, UD and IED) in echoic environments. The system is inspired by the auditory system of mammals. It uses a Gammatone filter bank for preprocessing and extracts the ITD and IED cues via zero crossings (UD calculation is straight forward). The mapping between the cue values and the different angles is learned offline which facilitates the adaptation to different head geometries. The performance of the system is demonstrated by localization results for two simultaneous speakers and the mixture of a speaker, music, and fan noise in a normal meeting room. A real time demonstrator of the system is presented in T. Rodemann, et al. (2006)

09 Mar 2006
TL;DR: In this paper, the most important effects that play a role in the acoustic modelling of combustion systems have been included in this network model, such as acoustic damping due to turbulence, acoustic reection at contractions and expansions with a mean ow, modication of the acoustic speed of sound due to a mean flow and the effect of a temperature gradient.
Abstract: The present study is concerned with the development and validation of efientt numerical algorithms to check combustion systems for their sensitivity to thermoacoustic instabilities. For this purpose, a good acoustic model is needed. Since the acoustics in combustion systems are essentially one-dimensional, an efcient onedimensional acoustic network model has been used to model this acoustic system. The most important effects that play a role in the acoustic modelling of combustion systems have been included in this network model. These effects comprise acoustic damping due to turbulence, acoustic reection at contractions and expansions with a mean ow, modication of the acoustic speed of sound due to a mean ow and the effect of a temperature gradient. Consequently, the network model can handle most combustion system layouts.

Journal ArticleDOI
TL;DR: In this paper, a high-pass digital FIR filter is designed for the signal filtering, and the signal correlation analysis by correlation coefficient is performed to obtain accurate acoustic transit time data from the filtered simulated and received acoustic signals.

Journal ArticleDOI
TL;DR: In this article, four particle velocity sensors are combined to one (small) device to determine the four autospectra and the six cross spectra in a reverberant room, which gives information of the free field (sound field without a contribution of reflections) and of the reverberant field.
Abstract: In a sound field disturbance of pressure, particle velocity, density, temperature, and energy occur. In this paper acoustic disturbances in air are considered. In the majority of papers on acoustics only changes in the sound pressure are reported while in this paper results on the particle velocity are reported. Since particle velocity is a vector, while the pressure is a scalar, more information can be obtained when using a particle velocity sensor instead of a pressure sensor (microphone). Four particle velocity sensors are combined to one (small) device. In a reverberant room the four autospectra and the six cross spectra are determined. Interpretation of the measured results gives information of the free field (sound field without a contribution of reflections) as well as of the reverberant field

Proceedings ArticleDOI
01 Oct 2006
TL;DR: A broadband beampattern synthesis method for sound source localization in the nearfield or in the farfield of a mobile robot, with a small-size linear array, based on the theory of modal analysis and involves an original convex optimization procedure which benefits from the Parseval relation.
Abstract: This paper describes a broadband beampattern synthesis method for sound source localization in the nearfield or in the farfield of a mobile robot, with a small-size linear array. The method is based on the theory of Modal Analysis and involves an original convex optimization procedure which benefits from the Parseval relation. The optimized beampattern is obtained by numerically minimizing the worst-case error between the modal coefficients of the array response and those of the reference beampattern, up to a finite rank of the series expansion, over a frequency grid. Simulations illustrate the analytical development.

Proceedings ArticleDOI
01 Sep 2006
TL;DR: In this article, a vertical line array of 5 vector hydrophones was used to measure ambient noise near Ketchikan, Alaska, and the results were compared to theoretical models using different vertical noise power directivities.
Abstract: Vector hydrophones are compact acoustic sensors that provide intrinsic non-omnidirectional (e.g. dipole) beampatterns with a single sensor. Often the three orthogonal components of the particle velocity or particle acceleration field are measured using a three-axis neutrally buoyant motion sensor. This type of sensor can form dipole beams steered to any azimuth or elevation angle. If a conventional pressure-sensing hydrophone is integrated with the three-axis motion sensor, then different types of single-sensor responses can be formed including cardioids (with varying front-to-back ratios) and the acoustic intensity. An array of vector hydrophones can form beams using the different element-level responses or the weights for each sensor component can be set adaptively. A method for forming an "intensity beam" response which is not purely multiplicative is developed. A vertical line array of 5 vector hydrophones was used to measure ambient noise near Ketchikan, Alaska. Free-field voltage sensitivity calibrations are discussed. Frequency auto-and cross-spectra are presented for the pressure and particle acceleration components. The pressure, velocity, cardioid, and intensity single-sensor and beam responses are analyzed. Array gain results are compared to theoretical models using different vertical noise power directivities. Advantages and disadvantages of each type of beam response are discussed.

Proceedings Article
01 Jan 2006
TL;DR: This work generalizes the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint Probabilistic Data Association filter (JPDAF), which maintains a separate state vector for each active speaker.
Abstract: In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. While this system functioned well, its utility was limited to scenarios in which a single speaker was to be tracked. In this work, we remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint probabilistic data association filter (JPDAF), which maintains a separate state vector for each active speaker. In a set of experiments conducted on seminar and meeting data, the JPDAF speaker tracking system reduced the multiple object tracking errror from 20.7% to 14.3% with respect to the IEKF system. In a set of automatic speech recognition experiments conducted on the output of a 64 channel microphone array which was beamformed using automatic speaker position estimates, applying the JPDAF tracking system reduced word error rate from 67.3% to 66.0%. Moreover, the word error rate on the beamformed output was 13.0% absolute lower than on a single channel of the array. Index Terms: acoustic source localization, Kalman filter, person tracking, far-field speech recognition, microphone arrays

Patent
26 May 2006
TL;DR: In this paper, the authors proposed a method to suppress sound input from a sound source other than the sound source in a predetermined direction by using a simplified construction to suppress surrounding noise.
Abstract: PROBLEM TO BE SOLVED: To provide a directional sound collector, a directional sound collecting method, and a computer program, capable of enhancing a sound signal issued from a sound source in a predetermined direction with a simplified construction to suppress surrounding noise with simplified construction, and without needing to install many microphones when there are input sound signals each containing sounds and noise from sound sources existent in a plurality of directions SOLUTION: Sound inputs from sound sources existent in a plurality of directions are received and converted to signals on a frequency axis The converted signals on the frequency axis are corrected by calculating a suppression function for suppressing the converted signal on the frequency axis, and multiplying amplitude components of the signals on the frequency axis of the original signal by the calculated suppression function A difference of phase components is calculated by calculating phase components of the converted signals on each frequency axis for every same frequency On the basis of the difference of the phase components, a probability value is specified indicating a probability of a sound source being existent in a predetermined direction On the basis of the specified probability value, the suppression function is calculated which suppresses sound input from a sound source other than the sound source in the predetermined direction COPYRIGHT: (C)2008,JPO&INPIT