scispace - formally typeset
Search or ask a question

Showing papers on "Microphone array published in 2015"


Proceedings ArticleDOI
19 Apr 2015
TL;DR: A learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation and uses a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA.
Abstract: This paper presents a learning-based approach to the task of direction of arrival estimation (DOA) from microphone array input. Traditional signal processing methods such as the classic least square (LS) method rely on strong assumptions on signal models and accurate estimations of time delay of arrival (TDOA) . They only work well in relatively clean conditions, but suffer from noise and reverberation distortions. In this paper, we propose a learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation. Specifically, we extract features from the generalised cross correlation (GCC) vectors and use a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA. One advantage of the learning based method is that as more and more training data becomes available, the DOA estimation will become more and more accurate. Experimental results on simulated data show that the proposed learning based method produces much better results than the state-of-the-art LS method. The testing results on real data recorded in meeting rooms show improved root-mean-square error (RMSE) compared to the LS method.

295 citations


Journal ArticleDOI
TL;DR: The analysis of the A. lightfooti data provides the first statistically rigorous estimate of calling male density for an anuran population using a microphone array, and it is shown that using TOA information can substantially improve estimate precision.
Abstract: Funding for the frog survey was received from the National Geographic Society/Waitt Grants Program (No. W184-11). The EPSRC and NERC helped to fund this research through a PhD grant (No. EP/I000917/1).

90 citations


Patent
08 Sep 2015
TL;DR: In this article, the authors present an integrated circuit including a microphone array, motion sensing, position sensing circuitry, analog-to-digital converter (ADC) circuitry configured to convert analog audio signals from the microphone array into digital audio signals for further processing, and a digital signal processor (DSP) or other circuitry for processing the digital audio signal based on motion data and other sensor data.
Abstract: The present disclosure relates generally to improving acoustic source tracking and selection and, more particularly, to techniques for acoustic source tracking and selection using motion or position information. Embodiments of the present disclosure include systems designed to select and track acoustic sources. In one embodiment, the system may be realized as an integrated circuit including a microphone array, motion sensing circuitry, position sensing circuitry, analog-to-digital converter (ADC) circuitry configured to convert analog audio signals from the microphone array into digital audio signals for further processing, and a digital signal processor (DSP) or other circuitry for processing the digital audio signals based on motion data and other sensor data. Sensor data may be correlated to the analog or digital audio signals to improve source separation or other audio processing.

80 citations


Journal ArticleDOI
TL;DR: A grid-based method to estimate the location of multiple sources in a wireless acoustic sensor network, where each sensor node contains a microphone array and only transmits direction-of-arrival (DOA) estimates in each time interval, reducing the transmissions to the central processing node.

80 citations


Patent
06 Jun 2015
TL;DR: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine the plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance as discussed by the authors.
Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.

76 citations


Journal ArticleDOI
TL;DR: In this paper, the virtual rotating array (VRA) method was used to identify the main noise contributors and determine a full spectrum for any rotating component of interest, including a four-bladed fan.
Abstract: Methods based on microphone array measurements provide a powerful tool for determining the location and magnitude of acoustic sources. For stationary sources, sophisticated algorithms working in the frequency domain can be applied. By using circularly arranged arrays and interpolating between microphone signals it is possible to treat rotating sources, as are present in fans, as being non-moving. Measurements conducted with a four-bladed fan and analyzed with the “virtual rotating array” method show that it is not only possible to identify the main noise contributors, but also to determine a full spectrum for any rotating component of interest.

62 citations


Journal ArticleDOI
TL;DR: A likelihood to evaluate the stationarity in the STFT domain to evaluation the compensation of drift is formulated and the maximum likelihood estimation is obtained effectively by a golden section search.

56 citations


Patent
27 Jul 2015
TL;DR: In this paper, the authors propose a failure detection system for an omnidirectional microphone array device having a plurality of microphone elements and a directivity control device that calculates a delay time of a voice propagated from a sound source to each microphone element.
Abstract: A failure detection system includes an omnidirectional microphone array device having a plurality of microphone elements and a directivity control device that calculates a delay time of a voice propagated from a sound source to each microphone element and forms a directivity of the voice using the delay time and the voice collected by the omnidirectional microphone array device, and detects a failure of the microphone element. A smoothing unit calculates an average power of one microphone element. An average calculator calculates a total average power of a plurality of usable microphone elements included in the omnidirectional microphone array device. A comparison unit compares whether or not a difference between the average power and the total average power exceeds a range of ±6 dB, and determines whether the microphone element is in failure based on the comparison result.

46 citations


Journal ArticleDOI
TL;DR: A parametric method for perceptual sound field recording and reproduction from a small-sized microphone array to arbitrary loudspeaker layouts is presented and it is demonstrated that, on the same task, the method outperforms linear reproduction with the same recordings available.
Abstract: This papers presents a parametric method for perceptual sound field recording and reproduction from a small-sized microphone array to arbitrary loudspeaker layouts. The applied parametric model has been found to be effective and well-correlated with perceptual attributes in the context of directional audio coding, and here it is generalized and extended to higher orders of spherical harmonic signals. Higher order recordings are used for estimation of the model parameters inside angular sectors that provide increased separation between simultaneous sources and reverberation. The perceptual synthesis according to the combined properties of these sector parameters is achieved with an adaptive least-squares mixing technique. Furthermore, considerations regarding practical microphone arrays are presented and a frequency-dependent scheme is proposed. A realization of the system is described for an existing spherical microphone array and for a target loudspeaker setup similar to NHK 22.2. It is demonstrated through listening tests that, compared to a reference scene, the perceived difference is greatly reduced with the proposed higher order analysis model. The results further indicate that, on the same task, the method outperforms linear reproduction with the same recordings available.

45 citations


Journal ArticleDOI
TL;DR: The results show improvement in instrumental measure for intelligibility and frequency-weighted SNR over complex-valued non-negative matrix factorization (CNMF) source separation approach, spatial sound source separation, and conventional beamforming methods such as the DSB and minimum variance distortionless response (MVDR).

45 citations


Patent
19 Mar 2015
TL;DR: In this article, an acoustic image device is utilized with a microphone array, image sensor, acoustic image controller, and a controller to detect sound variations by identifying regions of pixels having intensities exceeding a particular threshold.
Abstract: Techniques are disclosed for scene analysis including the use of acoustic imaging and computer audio vision processes for monitoring applications In some embodiments, an acoustic image device is utilized with a microphone array, image sensor, acoustic image controller, and a controller In some cases, the controller analyzes at least a portion of the spatial spectrum within the acoustic image data to detect sound variations by identifying regions of pixels having intensities exceeding a particular threshold In addition, the controller can detect two or more co-occurring sound events based on the relative distance between pixels with intensities exceeding the threshold The resulting data fusion of image pixel data, audio sample data, and acoustic image data can be analyzed using computer audio vision, sound/voice recognition, and acoustic signature techniques to recognize/identify audio and visual features associated with the event and to empirically or theoretically determine one or more conditions causing each event

Patent
18 Nov 2015
TL;DR: In this article, a plurality of audio sensors are mounted on the surface of an acoustically rigid polyhedron that approximates a sphere, and the audio signals generated by those sensors are decomposed into a set of eigenbeams having at least one eigenbeam of order two (or higher).
Abstract: A microphone array-based audio system that supports representations of auditory scenes using second-order (or higher) harmonic expansions based on the audio signals generated by the microphone array. In one embodiment, a plurality of audio sensors are mounted on the surface of an acoustically rigid polyhedron that approximates a sphere. The number and location of the audio sensors on the polyhedron are designed to enable the audio signals generated by those sensors to be decomposed into a set of eigenbeams having at least one eigenbeam of order two (or higher). Beamforming (e.g., steering, weighting, and summing) can then be applied to the resulting eigenbeam outputs to generate one or more channels of audio signals that can be utilized to accurately render an auditory scene.

Patent
30 Dec 2015
TL;DR: In this article, sound is banked laterally over an array of microphones arranged on a rear surface of a device, and the resulting sound enters a duct behind the device from different directions via inlets along the sides of the device.
Abstract: Sound is banked laterally over an array of microphones arranged on a rear surface of a device. Sound enters a duct behind the device from different directions via inlets along the sides of the device. The duct directs the sound waves across the microphone array. An effective direction from which the banked sounds originated is determined, relative to a front of the device. Based on the determined effective direction, the device applies spatial filtering to isolate the received sound waves, selectively increasing a signal-to-noise ratio of sound from the selected source and at least partially occluding sounds from other sources.

Journal ArticleDOI
TL;DR: A method to design two-dimensional planar microphone arrays that are capable of capturing three-dimensional (3D) spatial soundfields is proposed, capable of measuring soundfield components that are undetectable to conventional planar omni-directional microphone arrays.
Abstract: Soundfield analysis based on spherical harmonic decomposition has been widely used in various applications; however, a drawback is the three-dimensional geometry of the microphone arrays. In this paper, a method to design two-dimensional planar microphone arrays that are capable of capturing three-dimensional (3D) spatial soundfields is proposed. Through the utilization of both omni-directional and first order microphones, the proposed microphone array is capable of measuring soundfield components that are undetectable to conventional planar omni-directional microphone arrays, thus providing the same functionality as 3D arrays designed for the same purpose. Simulations show that the accuracy of the planar microphone array is comparable to traditional spherical microphone arrays. Due to its compact shape, the proposed microphone array greatly increases the feasibility of 3D soundfield analysis techniques in real-world applications.

Journal ArticleDOI
TL;DR: Improved performance achieved by this cooperative node-specific direction-of-arrival (DOA) estimation in a fully connected wireless acoustic sensor network (WASN) is demonstrated by means of numerical simulations for two different subspace-based DOA estimation methods (MUSIC and ESPRIT).

Journal ArticleDOI
TL;DR: Maximum likelihood methods are applied for direction of arrival estimation of reflections in short time windows of room impulse responses measured with a spherical microphone array to show that direction estimation with ML methods is more robust against noise and less biased than MUSIC or beamforming.
Abstract: This paper studies the direction of arrival estimation of reflections in short time windows of room impulse responses measured with a spherical microphone array. Spectral-based methods, such as multiple signal classification (MUSIC) and beamforming, are commonly used in the analysis of spatial room impulse responses. However, the room acoustic reflections are highly correlated or even coherent in a single analysis window and this imposes limitations on the use of spectral-based methods. Here, we apply maximum likelihood (ML) methods, which are suitable for direction of arrival estimation of coherent reflections. These methods have been earlier developed in the linear space domain and here we present the ML methods in the context of spherical microphone array processing and room impulse responses. Experiments are conducted with simulated and real data using the em32 Eigenmike. The results show that direction estimation with ML methods is more robust against noise and less biased than MUSIC or beamforming.

Proceedings ArticleDOI
09 Nov 2015
TL;DR: Speaker directional information, obtained using sound source localization from a microphone array is used to supervise the training of these video features that aim to capture other cues: movement of the head, upper body and hands of active speakers.
Abstract: Active speakers have traditionally been identified in video by detecting their moving lips. This paper demonstrates the same using spatio-temporal features that aim to capture other cues: movement of the head, upper body and hands of active speakers. Speaker directional information, obtained using sound source localization from a microphone array is used to supervise the training of these video features.

Proceedings ArticleDOI
28 Dec 2015
TL;DR: This work proposes a novel method for 3D direction of arrival (DOA) estimation based on the sound intensity vector estimation, via the encoding of the signals of a spherical microphone array from the space domain to the spherical harmonic domain.
Abstract: This work proposes a novel method for 3D direction of arrival (DOA) estimation based on the sound intensity vector estimation, via the encoding of the signals of a spherical microphone array from the space domain to the spherical harmonic domain. The sound intensity vector is estimated on detected single source zones (SSZs), where one source is dominant. A smoothed 2D histogram of these estimates reveals the DOA of the present sources and through an iterative process, accurate 3D DOA information can be obtained. The performance of the proposed method is demonstrated through simulations in various signal-to-noise ratio and reverberation conditions.

Journal ArticleDOI
TL;DR: The proposed distributed unscented Kalman filter (DUKF) can estimate speaker's positions globally in the network and obtain a smoothed trajectory of the speaker's movement robustly in noisy and reverberant environments, and it is scalable for speaker tracking.
Abstract: In this paper, we first propose a distributed unscented Kalman filter (DUKF) to overcome the nonlinearity of measurement model in speaker tracking Next, for the different motion dynamics of a speaker in the in-door environment, we introduce the interacting multiple model (IMM) algorithm and propose a distributed interacting multiple model-unscented Kalman filter (IMM-UKF) for estimating time-varying speaker's positions in a microphone array network In the distributed IMM-UKF based speaker tracking method, the time difference of arrival (TDOA) of the speech signals received by a pair of microphones at each node is estimated by the generalized cross-correlation (GCC) method, then the distributed IMM-UKF is used to track a speaker whose position and speed significantly vary over time in a microphone array network The proposed method can estimate speaker's positions globally in the network and obtain a smoothed trajectory of the speaker's movement robustly in noisy and reverberant environments, and it is scalable for speaker tracking Simulation and real-world experiment results reveal the effectiveness of the proposed speaker tracking method

Patent
21 Oct 2015
TL;DR: In this article, a locating and tracking method based on a sound source array is proposed, which includes the following steps: S1, acquiring on-site sound through a quintuple microphone array, and pre-processing a sound signal acquired by each microphone in the quintuple microphones array to obtain an audio signal; S2, performing sound source locating for the audio signals according to arrival time delay of the audio signal between the microphones and positional information of the microphone array.
Abstract: The present invention discloses a locating and tracking method based on a sound source array, comprising the following steps: S1, acquiring on-site sound through a quintuple microphone array, and pre-processing a sound signal acquired by each microphone in the quintuple microphone array to obtain an audio signal; S2, performing sound source locating for the audio signals according to arrival time delay of the audio signals between the microphones and positional information of the microphone array, so as to calculate a pitch angle, an azimuth angle and an object distance; and S3, moving and turning a locating and tracking apparatus to arrive to a sound source position. The locating and tracking method of the present invention performs related processing for influences of non-gauss noises, coherent noises and indoor reverberation of the sound source on accurate locating, thereby improving accuracy of locating of the sound source.

Proceedings ArticleDOI
11 Dec 2015
TL;DR: This paper considers acoustic information recognition from a hovering helicopter which a microphone array system, and an acoustic model of the rotor noise is derived taking the dynamics of the helicopter into account, and the model is utilized to evaluate the performance of the microphone array.
Abstract: Unmanned multirotor helicopters have been expected for various critical tasks such as rescue missions, and it is important to sense the environment to achieve those tasks. In order to realize such sensor systems, this paper considers acoustic information recognition from a hovering helicopter which a microphone array system. As the noise of rotors distorts the acoustic information, noise reduction is necessary. Since the rotor noise varies even at the hovering flight, an acoustic model of the rotor noise is derived taking the dynamics of the helicopter into account, and the model is utilized to evaluate the performance of the microphone array. The proposed approach was verified by evaluating the optimality of a real device that was empirically tuned in authors previous work, and the computed configuration of the microphone array and the developed one coincide well, which ensures the validity of the approach. As the validation through a practical application, the signal was processed by Delay-and-Sum Beam Former to localize the sound source. The system was able to find peaks of the power that corresponded to the sound source.

Proceedings ArticleDOI
01 Oct 2015
TL;DR: Using the proposed wave-domain adaptive processing for noise cancellation within a large spatial region, the noise over the entire control region can be significantly reduced with fast convergence in both free-field and reverberant environments.
Abstract: This paper proposes wave-domain adaptive processing for noise cancellation within a large spatial region. We use fundamental solutions of the Helmholtz wave-equation as basis functions to express the noise field over a spatial region and show the wave-domain processing directly on the decomposition coefficients to control the entire region. A feedback control system is implemented, where only a single microphone array is placed at the boundary of the control region to measure the residual signals, and a loudspeaker array is used to generate the anti-noise signals. We develop the adaptive wave-domain filtered-x least mean square algorithm. Simulation results show that using the proposed method the noise over the entire control region can be significantly reduced with fast convergence in both free-field and reverberant environments.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a low-cost self-localization method which uses four elements of microphones, wheel rotation and sound sources as beacons, whose absolute location and frequency bands are known.
Abstract: In this paper, we have proposed a low-cost self-localization method which uses 4 elements of microphones, wheel rotation and sound sources as beacons, whose absolute location and frequency bands are known. The proposed method consists of following 4 steps. The proposed method (i) execute self-localization using wheel-based odometry, (ii) estimate direction-of-arrival (DOA) of the sound sources using sounds recorded by the elements of the microphone array, (iii) predict the DOA of the sound sources from estimated location and pose, and (iv) conduct self-localization by integrating all of the information. To evaluate the proposed method, experiments were conducted. The proposed method was compared to the conventional methods, which were wheel-based odometry and self-localization using only DOA. In the experiments, we have supposed the house-cleaning robot and its trajectory. As results, without any obstacles or walls, the mean of the estimation errors by wheel-based odometry were 670 mm and 0.08 rad, and those of self-localization using only DOA were 2870 m and 0.07 rad in the worst case. In contrast with these methods, proposed method results in 69 mm, 0.02 rad as the worst estimation error of self location and pose. From the result with occlusion of a sound source, the mean of the localization error increased 60 mm, as the proposed method detects the incorrect DOA and prevents it from estimation. From the result with reflective wave from wall, there was a place where the localization error was large. The cause of this error was considered as directivity of sound source. These results indicate that the proposed method is feasible under indoor environment.

Journal ArticleDOI
TL;DR: A framework was developed to simulate the entire path of sounds presented in a modeled room, recorded by a HOA microphone array, decoded to a loudspeaker array, and received at the ears and HA microphones of a dummy listener fitted with HAs and found that the diffuse reverberation reduces the considered time-averaged HOA reconstruction errors.
Abstract: Recently, an increased interest has been demonstrated in evaluating hearing aids (HAs) inside controlled, but at the same time, realistic sound environments. A promising candidate that employs loudspeakers for realizing such sound environments is the listener-centered method of higher-order ambisonics (HOA). Although the accuracy of HOA has been widely studied, it remains unclear to what extent the results can be generalized when (1) a listener wearing HAs that may feature multi-microphone directional algorithms is considered inside the reconstructed sound field and (2) reverberant scenes are recorded and reconstructed. For the purpose of objectively validating HOA for listening tests involving HAs, a framework was developed to simulate the entire path of sounds presented in a modeled room, recorded by a HOA microphone array, decoded to a loudspeaker array, and finally received at the ears and HA microphones of a dummy listener fitted with HAs. Reproduction errors at the ear signals and at the output of a cardioid HA microphone were analyzed for different anechoic and reverberant scenes. It was found that the diffuse reverberation reduces the considered time-averaged HOA reconstruction errors which, depending on the considered application, suggests that reverberation can increase the usable frequency range of a HOA system.

Patent
26 May 2015
TL;DR: In this paper, the authors present methods, circuits, devices, systems and associated computer executable code for acquiring, processing and rendering acoustic signals, where one or more direction specific audio signals may be generated using a microphone array comprising two or more microphones and an audio stream generator.
Abstract: The present invention includes methods, circuits, devices, systems and associated computer executable code for acquiring, processing and rendering acoustic signals. According to some embodiments, one or more direction specific audio signals may be generated using a microphone array comprising two or more microphones and an audio stream generator. The audio stream generator may receive a direction parameter from an optical tracking system. There may be provided an audio rendering system adapted to normalize and/or balance acoustic signals acquired from a soundscape.

Journal ArticleDOI
TL;DR: The benefit provided to listeners with sensorineural hearing loss by an acoustic beamforming microphone array was determined in a speech-on-speech masking experiment and masked speech reception thresholds for spatially separated maskers were higher (poorer) on average for the SNHL than for the normal-hearing listeners.
Abstract: The benefit provided to listeners with sensorineural hearing loss (SNHL) by an acoustic beamforming microphone array was determined in a speech-on-speech masking experiment. Normal-hearing controls...

Journal ArticleDOI
TL;DR: A signal model that takes the motion of the robot into account is presented and it is demonstrated that by using the motion-based enhancement method it is possible to improve the direction of arrival estimation performance, as compared to that obtained when using a stationary array.
Abstract: The auditory system of humanoid robots has gained increased attention in recent years. This system typically acquires the surrounding sound field by means of a microphone array. Signals acquired by the array are then processed using various methods. One of the widely applied methods is direction of arrival estimation. The conventional direction of arrival estimation methods assume that the array is fixed at a given position during the estimation. However, this is not necessarily true for an array installed on a moving humanoid robot. The array motion, if not accounted for appropriately, can introduce a significant error in the estimated direction of arrival. The current paper presents a signal model that takes the motion into account. Based on this model, two processing methods are proposed. The first one compensates for the motion of the robot. The second method is applicable to periodic signals and utilizes the motion in order to enhance the performance to a level beyond that of a stationary array. Numerical simulations and an experimental study are provided, demonstrating that the motion compensation method almost eliminates the motion-related error. It is also demonstrated that by using the motion-based enhancement method it is possible to improve the direction of arrival estimation performance, as compared to that obtained when using a stationary array.

Journal ArticleDOI
TL;DR: Evaluation of the system performances in comparison with other state-of-the-art methods indicates that the proposed design is practical for the acoustic target classification and may be widely adopted by UGS.
Abstract: The acoustic recognition module of the unattended ground sensor (UGS) system applied in wild environments is faced with the challenge of complicated noise interference. In this paper, a small-aperture microphone array (MA)-based acoustic target classification system, including the system hardware architecture and classification algorithm scheme, is designed as a node-level sensor for the application of UGS in noisy situation. Starting from the analysis of signature of the acoustic signal in wild environments and the merits of small-aperture array in noise reduction, a closely arranged microelectromechanical systems MA is designed to improve the signal quality. Considering the similarities between speaker discrimination and acoustic target recognition, a classification algorithm scheme, consisting of a simplified Mel-frequency cepstrum coefficients and the Gaussian mixture model, is developed to distinguish acoustic targets’ patterns. The proposed classification algorithm has been implemented on embedded system after being tested on training datasets. By combining the small-aperture array and low-complexity classification algorithm, the presented acoustic classification prototype system is portable and efficient. To demonstrate the efficiency of the design, the prototype system is verified in a practical situation with the employment of wheeled and tracked vehicles. Evaluation of the system performances in comparison with other state-of-the-art methods indicates that the proposed design is practical for the acoustic target classification and may be widely adopted by UGS.

Journal ArticleDOI
TL;DR: To avoid the degradation of the accuracy of the synthesized sound space by microphone internal noise, particularly in the low-frequency region, the effect of the signal-to-noise ratio (SNR) of microphones is analyzed and controlled by controlling condition numbers of matrix constructed from transfer functions.
Abstract: Sensing of high-definition three-dimensional (3D) sound-space information is of crucial importance for realizing total 3D spatial sound technology. We have proposed a sensing method for 3D sound-space information using symmetrically and densely arranged microphones. This method is called SENZI (Symmetrical object with ENchased Zillion microphones). In the SENZI method, signals recorded by the microphones are simply weighted and summed to synthesize a listener’s headrelated transfer functions (HRTFs), reflecting the direction in which the listener is facing even after recording. The SENZI method is being developed as a real-time system using a spherical microphone array and field-programmable gate arrays (FPGAs). In the SENZI system, 252 electric condenser microphones (ECMs) were almost uniformly distributed on a rigid sphere. The deviations of the microphone frequency responses were compensated for using the transfer function of the rigid sphere. To avoid the degradation of the accuracy of the synthesized sound space by microphone internal noise, particularly in the low-frequency region, we analyzed the effect of the signal-to-noise ratio (SNR) of microphones on the accuracy of synthesized sound-space information by controlling condition numbers of matrix constructed from transfer functions. On the basis of the results of these analyses, a compact SENZI system was implemented. Results of experiments indicated that 3D sound-space information was well expressed using the system.