scispace - formally typeset
Search or ask a question

Showing papers on "Microphone array published in 2002"


Proceedings ArticleDOI
13 May 2002
TL;DR: This paper describes a beamforming microphone array consisting of pressure microphones that are mounted on the surface of a rigid sphere based on a spherical harmonic decomposition of the soundfield that allows a simple and computationally effective, yet flexible beamformer structure.
Abstract: This paper describes a beamforming microphone array consisting of pressure microphones that are mounted on the surface of a rigid sphere. The beamformer is based on a spherical harmonic decomposition of the soundfield. We show that this allows a simple and computationally effective, yet flexible beamformer structure. The look-direction can be steered to any direction in 3-D space without changing the beampattern. In general the number of sensors and their location is quite arbitrary as long as they hold a certain orthogonality constraint that we derived. For a practical example we chose a spherical array with 32 elements. The microphones are located at the center of the faces of a truncated icosahedron. The radius of the sphere is 5 cm. With this setup we can achieve a Directivity Index of 12 dB and higher. The operating frequency range is from 100 Hz to 5 kHz.

433 citations


Proceedings ArticleDOI
13 May 2002
TL;DR: Spherical harmonics analysis is used to establish theory and design of a higher order recording system, which comprises an array of small microphones arranged in a spherical configuration and associated signal processing, which has implications to the advancement of future sound field reconstruction systems.
Abstract: A major problem in sound field reconstruction systems is how to record the higher order (> 1) harmonic components of a given sound field. Spherical harmonics analysis is used to establish theory and design of a higher order recording system, which comprises an array of small microphones arranged in a spherical configuration and associated signal processing. This result has implications to the advancement of future sound field reconstruction systems. An example of a third order system for operation over a 10∶1 frequency range of 340 Hz to 3.4 kHz is given.

252 citations


Proceedings ArticleDOI
13 May 2002
TL;DR: This work compares the performance of a null-steering beamformer against that of a frequency-domain BSS method in a reverberant environment, and proposes a permutation alignment scheme based on information gathered from the microphone array directivity patterns.
Abstract: In this work, we explore important connections between blind source separation (BSS) and ideal beamforming. We first compare the performance of a null-steering beamformer against that of a frequency-domain BSS method in a reverberant environment, drawing some interesting conclusions. We then examine the feasibility of using beamformer concepts to resolve permutation inconsistency across frequency, which degrades the performance of BSS methods in a reverberant environment. We also propose a permutation alignment scheme based on information gathered from the microphone array directivity patterns. This technique is novel in the sense that it works satisfactorily even when the directivity patterns exhibit grating lobes, where, in fact, better separation can be achieved in principle. We perform experiments that support the viability of the proposed method under different operating conditions and microphone spacings.

161 citations


PatentDOI
Yong Rui1
TL;DR: In this article, a system and process for estimating the location of a speaker using signals output by a microphone array characterized by multiple pairs of audio sensors is described, and a consensus location for the speaker is computed from the individual location estimates associated with each pair of microphone array audio sensors taking into consideration the uncertainty of each estimate.
Abstract: A system and process is described for estimating the location of a speaker using signals output by a microphone array characterized by multiple pairs of audio sensors. The location of a speaker is estimated by first determining whether the signal data contains human speech components and filtering out noise attributable to stationary sources. The location of the person speaking is then estimated using a time-delay-of-arrival based SSL technique on those parts of the data determined to contain human speech components. A consensus location for the speaker is computed from the individual location estimates associated with each pair of microphone array audio sensors taking into consideration the uncertainty of each estimate. A final consensus location is also computed from the individual consensus locations computed over a prescribed number of sampling periods using a temporal filtering technique.

108 citations


Journal Article
TL;DR: In this paper, the connection between circular holophony, high-order incoming and outgoing ambisonics, and plane-wave decomposition for a sound field was established and used as a tool for auralization.
Abstract: In order to correctly reproduce ( “auralize” ) the acoustic wave field in a hall through a wave-field synthesis (WFS) system, impulse responses are nowadays measured along arrays of microphone positions. Three array configurations are considered ‐ linear, cross, and circular. The linear and cross array configurations both have strong limitations, most of which can be avoided by using circular arrays. Auralization techniques are explained for all types of arrays. For the circular array configuration the connection between circular holophony, highorder incoming and outgoing ambisonics, and plane-wave decomposition for a sound field will be established and used as a tool for auralization.

106 citations


Patent
03 Dec 2002
TL;DR: In this paper, a method and apparatus for reducing echo and noise was proposed, which includes a microphone array for receiving and audio signal, the audio signal including a voice signal component and a noise signal component.
Abstract: The present invention provides a solution to the needs described above through a method and apparatus for reducing echo and noise. The apparatus includes a microphone array for receiving and audio signal, the audio signal including a voice signal component and a noise signal component. The apparatus further includes a voice processing path having an input coupled to the microphone array and a noise processing path having an input coupled to the microphone array. The voice processing path is adapted to detect voice signals and the noise processing path is adapted to detect noise signals. A first echo controller is coupled to the voice processing path and a second echo controller is coupled to the noise processing path. A noise reducer is coupled to the output of the first echo controller and second echo controller.

86 citations


Journal ArticleDOI
01 Nov 2002
TL;DR: A maximum likelihood estimator for the correct position and orientation of the array is derived and used to localize and track a microphone array with a known and fixed geometrical structure, which can be viewed as the inverse sound localization problem.
Abstract: This paper introduces a mechanism for localizing a microphone array when the location of sound sources in the environment is known. Using the proposed spatial observability function based microphone array integration technique, a maximum likelihood estimator for the correct position and orientation of the array is derived. This is used to localize and track a microphone array with a known and fixed geometrical structure, which can be viewed as the inverse sound localization problem. Simulations using a two-element dynamic microphone array illustrate the ability of the proposed technique to correctly localize and estimate the orientation of the array even in a very reverberant environment. Using 1 s male speech segments from three speakers in a 7 m by 6 m by 2.5 m simulated environment, a 30 cm inter-microphone distance, and PHAT histogram SLF generation, the average localization error was approximately 3 cm with an average orientation error of 19/spl deg/. The same simulation configuration but with 4 s speech segments results in an average localization error less than 1cm, with an average orientation error of approximately 2/spl deg/. Experimental examples illustrate localizations for both stationary and dynamic microphone pairs.

85 citations


PatentDOI
TL;DR: In this article, an adaptive filter that filters an input based upon a plurality of adaptive coefficients and modifies the adaptive coefficients based on a feedback output is presented. But the adaptive filter is not suitable for the acoustic reverberation reduced output.
Abstract: A system and method facilitating signal enhancement utilizing an adaptive filter is provided. The invention includes an adaptive filter that filters an input based upon a plurality of adaptive coefficients and modifies the adaptive coefficients based on a feedback output. A feedback component provides the feedback output based, at least in part, upon a non-linear function of the acoustic reverberation reduced output. Optionally, the system can further include a linear prediction (LP) analyzer and/or a LP synthesis filter. The system can enhance signal(s), for example, to improve the quality of speech that is acquired by a microphone by reducing reverberation. The system utilizes, at least in part, the principle that certain characteristics of reverberated speech are measurably different from corresponding characteristics of clean speech. The system can employ a filter technology (e.g., reverberation reducing) based on a non-linear function, for example, the kurtosis metric.

84 citations


Patent
15 Mar 2002
TL;DR: In this article, computer vision algorithms are used to detect, locate, and track people in the field of view of a wide-angle, stationary camera, and the estimated acoustic delay obtained from a microphone array, consisting of only two horizontally spaced microphones, is used to select the person speaking.
Abstract: A method and apparatus for a video conferencing system using an array of two microphones and a stationary camera to automatically locate a speaker and electronically manipulate the video image to produce the effect of a movable pan tilt zoom ('PTZ') camera. Computer vision algorithms are used to detect, locate, and track people in the field of view of a wide-angle, stationary camera. The estimated acoustic delay obtained from a microphone array, consisting of only two horizontally spaced microphones, is used to select the person speaking. This system can also detect any possible ambiguities, in which case, it can respond in a fail-safe way, for example, it can zoom out to include all the speakers located at the same horizontal position.

84 citations


Proceedings ArticleDOI
13 May 2002
TL;DR: A microphone array post-filtering approach, applicable to adaptive beamformer, that differentiates non-stationary noise components from speech components is introduced, based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique.
Abstract: Microphone array post-filtering allows additional reduction of noise components at a beamformer output. Existing techniques are either restricted to classical delay-and-sum beamformers, or are based on single-channel speech enhancement algorithms that are inefficient at attenuating highly non-stationary noise components. In this paper, we introduce a microphone array post-filtering approach, applicable to adaptive beamformer, that differentiates non-stationary noise components from speech components. The ratio between the transient power at the beamformer primary output and the transient power at the reference noise signals is used for indicating whether such a transient is desired or interfering. Based on a Gaussian statistical model and combined with an appropriate spectral enhancement technique, a significantly reduced level of non-stationary noise is achieved without further distorting speech components. Experimental results demonstrate the effectiveness of the proposed method.

76 citations


Journal ArticleDOI
TL;DR: This talk describes the decomposition of the sound field into orthogonal components, the so‐called spherical harmonics, which are directly related to the method of Ambisonics and contain all required information to allow a reconstruction of the original sound field.
Abstract: The progression of audio from monophonic to the present day 5‐channel playback is being driven by the desire to improve the immersion of the listener into the acoustic scene. In the limit, the goal is the reconstruction of the original sound field. This talk describes the decomposition of the sound field into orthogonal components, the so‐called spherical harmonics and is directly related to the method of Ambisonics. These components contain all required information to allow a reconstruction of the original sound field. This approach is scalable to any number of loudspeakers and is also backwards compatible to surround sound, stereo and mono playback. One problem is the recording of the orthogonal components. So far only solutions exist that allow the recording of spherical harmonics up to first order. This limits the spatial resolution. This presentation introduces a new microphone that overcomes this limitation. It consists of pressure sensors that are equally distributed on the surface of a rigid sphere. The number of sensors depends on the highest order spherical harmonic to be recorded. A minimum of (n+1)2 sensors is required to record harmonics up to nth order. The sensor signals are then processed to give the desired spherical harmonic outputs.

Patent
18 Jul 2002
TL;DR: In this paper, a system for recording and reproducing a three dimensional auditory scene for individual listeners includes one or more microphone arrays (2 and 16); a support (3) for holding, moving the microphone array and also for attaching other devices (14); a data storage and encoding device (9); a control interface (13), and a processor and decoding device (10).
Abstract: A system for recording and reproducing a three dimensional auditory scene for individual listeners includes one or more microphone arrays (2 and 16); a support (3) for holding, moving the microphone array and also for attaching other devices (14); a data storage and encoding device (9); a control interface (13), and a processor and decoding device (10). The microphones in the microphone array (2) preferably have strong directional characteristics. The microphone array support mount (4) can support one or more physical structures (5) to provide directional acoustic filtering. The directional microphone array is electrically connected via a lead (8) to the sound encoding processor (9) and sound decoding processor (10). As the directional microphone array has acoustically directional properties, these properties can be adjusted using signal processing methods to match the acoustics of the external ears of the individual listener and thus result in a perceptually accurate recording and reproduction of a three dimensional auditory scene for the individual listener.

Proceedings ArticleDOI
17 Jun 2002
TL;DR: In this paper, an aeroacoustic study of a 26-scale landing gear model was conducted in the NASA Ames 7- by 10-foot wind tunnel using a phased microphone array.
Abstract: An aeroacoustic study of a 26%-scale landing gear model was conducted in the NASA Ames 7- by 10-Foot Wind Tunnel using a phased microphone array. The incorporation of complex parts via stereo lithography produced a model that can mimic full-scale details down to 3 mm. These details include the contours, brake cylinders, bolt holes, and wheel hubs that appear on the real landing gear. Major noise sources were identified and ranked. From the sideline view, the noise levels of the cable harness and torque link were each at least 8 dB above that of a clean configuration. Sources from the more ambiguous fly-over view, such as the front axle, center axle and rear axle regions, were 11 dB above the clean configuration for frequencies below 2000 Hz full-scale. This increment in noise likely included other sources situated behind the truck. Referenced to the clean configuration, the braces and links contributed as much as 8 dB. Tests with a fully sealed fairing on the landing gear suggest, through careful design of major components, a noise reduction of up to 15 dB can be achieved although 2 to 6 dB of noise reduction is probably a more realistic goal. NOMENCLATURE

Proceedings ArticleDOI
13 May 2002
TL;DR: A novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter is proposed, which results in significant improvement in terms of objective speech quality measures and speech recognition performance.
Abstract: This paper proposes a novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter. The technique is a modification of the existing Zelinski post-filter, which uses the auto- and cross-spectral densities of the array inputs to estimate the signal and noise spectral densities. The Zelinski technique, however, assumes zero cross-correlation between noise on different sensors. This assumption is inaccurate in real conditions, particularly at low frequencies and for arrays with closely spaced sensors. In this paper we replace this with an assumption of a theoretically diffuse noise field, which is more appropriate in a variety of realistic noise environments. In experiments using noise recordings from an office of computer workstations, the modified post-filter results in significant improvement in terms of objective speech quality measures and speech recognition performance.

Journal ArticleDOI
Markus Buck1
01 Mar 2002
TL;DR: In this contribution the performance of first-order differential arrays is examined and important parameters such as array response and directivity index are thoroughly studied and the frequency dependence of these parameters is investigated in order to derive limits for a favorable frequency range.
Abstract: Array processing for broadband signals like speech is not a trivial task Most sensor arrangements and the related beamforming algorithms show strong frequency dependent characteristics Differential microphone arrays promise frequency independent high directional gain with compact arrangements However, this type of microphone array has also the problem that even small deviations in microphone properties can cause severe degradation of the array's performance Especially at low frequencies the achievable gain is limited by the influence of sensor mismatch and sensor noiseIn this contribution the performance of first-order differential arrays is examined and important parameters such as array response and directivity index are thoroughly studied The frequency dependence of these parameters is investigated in order to derive limits for a favorable frequency range First, the performance of the array is examined for ideal sensors Then, two different models for sensor mismatch are introduced to describe the degradation of the array's performance In addition, solutions for sensor calibration and experimental results with real microphones are presented

PatentDOI
TL;DR: A second-order adaptive differential microphone array (ADMA) has two first-order elements (e.g., 802 and 804 of FIG. 8), each configured to convert a received audio signal into an electrical signal as discussed by the authors.
Abstract: A second-order adaptive differential microphone array (ADMA) has two first-order elements (e.g., 802 and 804 of FIG. 8), each configured to convert a received audio signal into an electrical signal. The ADMA also has (i) two delay nodes (e.g., 806 and 808 ) configured to delay the electrical signals from the first-order elements and (ii) two subtraction nodes (e.g., 810 and 812 ) configured to generate forward-facing and backward-facing cardioid signals based on differences between the electrical signals and the delayed electrical signals. The ADMA also has (i) an amplifier (e.g., 814 ) configured to amplify the backward-facing cardioid signal by a gain parameter; (ii) a third subtraction node (e.g., 816 ) configured to generate a difference signal based on a difference between the forward-facing cardioid signal and the amplified backward-facing cardioid signal; and (iii) a lowpass filter (e.g., 818 ) configured to filter the difference signal from the third subtraction node to generate the output signal for the second-order ADMA. The gain parameter for the amplifier can be adaptively adjusted to move a null in the back half plane of the ADMA to track a moving noise source. In a subband implementation, a different gain parameter can be adaptively adjusted to move a different null in the back half plane to track a different moving noise source for each different frequency subband.

Proceedings ArticleDOI
17 Jun 2002
TL;DR: It is demonstrated that nested arrays must be used for a study over a wide frequency range, and that comparisons of the noise maps between di erent arrays provide valuable information about the noise sources.
Abstract: Flyover measurements with a phased array of microphones extending over an area of 16 m by 16 m are reported. The 161 microphone array was made possible by combining hardware from ONERA and DLR. In this investigation of the airframe noise of an Airbus A340, the yover altitudes were between 90 m and 165 m. The data reduction methods for moving objects of DLR and ONERA are compared. Some source maps are shown and discussed. It is demonstrated that nested arrays must be used for a study over a wide frequency range, and that comparisons of the noise maps between di erent arrays provide valuable information about the noise sources. The ONERA method is shown to be a powerful data reduction method based on a small number of microphones while the DLR method results in alias-free maps at the expense of a much larger number of microphones.

Journal ArticleDOI
01 Jan 2002
TL;DR: In this article, the use of a directional array of microphones for the measurement of trailing edge (TE) noise is described, and the capabilities of this method are evaluated via measurements of TE noise from a NACA 63-215 airfoil model and from a cylindrical rod.
Abstract: The use of a directional array of microphones for the measurement of trailing edge (TE) noise is described. The capabilities of this method are evaluated via measurements of TE noise from a NACA 63-215 airfoil model and from a cylindrical rod. This TE noise measurement approach is compared to one that is based on the cross spectral analysis of output signals from a pair of microphones (COP method). Advantages and limitations of both methods are examined. It is shown that the microphone array can accurately measures TE noise and captures its two-dimensional characteristic over a large frequency range for any TE configuration as long as noise contamination from extraneous sources is within bounds. The COP method is shown to also accurately measure TE noise but over a more limited frequency range that narrows for increased TE thickness. Finally, the applicability and generality of an airfoil self-noise prediction method was evaluated via comparison to the experimental data obtained using the COP and array measurement methods. The predicted and experimental results are shown to agree over large frequency ranges.


Patent
01 Aug 2002
TL;DR: In this article, a method of identifying talker location using a steerable microphone array and processing the picked up audio signals to determine the location of an active talker is presented. But this method is limited to the case where the microphone array is steered in the direction of the active speaker and a cue is generated to identify the direction in which the microphone arrays has been steered.
Abstract: A method of identifying talker location includes picking up audio signals using a steerable microphone array and processing the picked up audio signals to determine the location of an active talker. The microphone array is then steered in the direction of the active talker and a cue is generated to identify the direction in which the microphone array has been steered.

Proceedings ArticleDOI
13 May 2002
TL;DR: This work presents an audio-video localization technique that combines the benefits of the two modalities, and achieves an 8.9 dB improvement over a single far-field microphone, a 6.7dB improvement over source separation based on video-only localization, and a 0.3 dB improved over separation based over audio-only localized.
Abstract: Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in intelligent environments, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone, a 6.7 dB improvement over source separation based on video-only localization, and a 0.3 dB improvement over separation based on audio-only localization.

Proceedings ArticleDOI
04 Aug 2002
TL;DR: A novel decomposition of the estimation problems for short-time spectral amplitude (STSA), log STSA, and phase in the Bayesian estimation framework is presented, based on the notion of sufficient statistics for the microphone array case.
Abstract: Microphone arrays provide new opportunities for noise reduction and speech enhancement. This paper presents a novel decomposition of the estimation problems for short-time spectral amplitude (STSA), log STSA, and phase in the Bayesian estimation framework. The decomposition is based on the notion of sufficient statistics for the microphone array case. It nicely generalizes the wellknown single-channel Ephraim-Malah estimators (1984, 1985) to the microphone array case. We also compare noise reduction obtained in the single channel with the two- and four-channel cases on real data.

PatentDOI
Hagai Attias1, Li Deng1
TL;DR: In this article, a system and method for signal enhancement utilizing mixture models is presented, which employs probabilistic modeling to perform signal enhancement of a plurality of windowed frequency transformed input signals received for an array of microphones.
Abstract: A system and method facilitating signal enhancement utilizing mixture models is provided. The invention includes a signal enhancement adaptive system having a speech model, a noise model and a plurality of adaptive filter parameters. The signal enhancement adaptive system employs probabilistic modeling to perform signal enhancement of a plurality of windowed frequency transformed input signals received, for example, for an array of microphones. The signal enhancement adaptive system incorporates information about the statistical structure of speech signals. The signal enhancement adaptive system can be embedded in an overall enhancement system which also includes components of signal windowing and frequency transformation.

Journal Article
TL;DR: This paper discusses several existing array techniques introducing a variety of application targets for the HUT microphone probe, which was originally designed for measurement purposes but has proven useful in other applications as well.
Abstract: Soundfield inside an enclosed space depends in a complex way upon interactions between emitted sound waves and different reflecting, diffracting, and scattering surfaces. 3-D microphone arrays provide tools for investigating and recording these interactions. This paper discusses several existing array techniques introducing a variety of application targets for the HUT microphone probe. Applications include directional measurement, analysis, and visualization of room responses, estimation of room parameters and analysis of source and surface positions. In a dynamic case the probe can be utilized in source tracking and beam steering, as well as in tracking its own position. Furthermore, the probe can be used to simulate some microphone arrays commonly used in surround sound recording. In each application case both general theory and its relation to the HUT probe is discussed. INTRODUCTION In all practical situation soundfield inside an enclosed space consists of acoustical waves propagating in several different directions. Sound waves emitted by a source are reflected, diffracted, and scattered by different obstacles including the walls of the enclosure. This results in a complex field, the properties of which cannot be comprehensively captured in one-dimensional signals or parameters. Different practitioners have different viewpoints into spatial sound. Researchers and acoustical engineers often want to measure a response to a given stimulus in a room, in order to gain information about the reasons why the room sounds like it does. The motivation for this may be an attempt to change the acoustics, or a pure scientific interest in the underlying phenomena, including perception of spatial sound. A recording engineer, on the other hand, may want to capture a performance in a room so that an illusion of the room can be later reproduced somewhere else. Another related problem is selective recording of certain sound sources. Third distinct area of interest is acoustical orientation, i.e., localization of sound sources, reflecting obstacles and surfaces, or the receiver itself based on received sound signals. Of course, there 5501 MERIMAA APPLICATIONS OF A 3-D MICROPHONE ARRAY Fig. 1: The HUT 3-D microphone probe. are also intersections between these viewpoints. This paper describes applications of a 3-D microphone array in all the previously mentioned tasks. The HUT microphone probe consists of 12 miniature electret microphone capsules arranged as two concentric pairs in each of x-, y-, and zcoordinate axes. The inner pairs are set with a spacing of 10 mm and the outer pairs with a spacing of 100 mm between the capsules. The probe was originally designed for measurement purposes but has proven useful in other applications as well. A picture of the probe is shown in Fig. 1. Related hardware and software are described in [1] and [2]. The paper is divided into three parts. The first part discusses directional measurement and analysis of room responses. Second part presents some possibilities of using the probe in sound recording. Finally, the third part introduces source localization techniques applicable both in measurement purposes and in acoustical orientation. ROOM RESPONSE MEASUREMENTS Measurement of room responses and analysis of related attributes is a common task in audio and acoustics. Most often an omnidirectional response to a preferably omnidirectional stimulus is acquired. This is sufficient for calculation of several room-acoustical parameters. However, single omnidirectional responses and standard parameters provide only limited information about the actual acoustics of the room and its perceptual properties. Microphone array techniques utlilized in room response measurements can be roughly divided into two categories. In the first category large arrays spanning a significant distance, area, or volume in the room are utilized. This gives a representation of the evolving soundfield as a function of spatial position. Application of a long line array in this purpose has been described in [3]. In the second category small arrays such as the HUT microphone probe are used to give a listener centered view of the directional soundfield. The following discussion concentrates on methods in the latter category. Directional sound pressure components Ideal omnidirectional microphones are sensitive to sound pressure, which as a scalar quantity does not include any directional information. Systems with varying directional sensitivity can be formed by appropriately combining the signals of two or more closely spaced omnidirectional microphones. An attractive feature of using an array of omnidirectional microphones in place of directional microphones is the possibility to vary and steer the directivity patterns later in the postprocessing phase. First-order differential directivity patterns can be easily created using the signals of a closely spaced pair of microphones. This kind of beamforming methods are analogous to the construction of microphones with built-in directionality [4, 5]. Basically, all that is needed is some equalization and delay, and a weighted summation of the resulting signals. An ideal dipole has a directivity pattern of the form

Journal ArticleDOI
TL;DR: Improvements in recognition accuracy due to multiple microphones, HMM training on contaminated speech and incremental adaptation are additive on a connected digits task and the results show that unsupervised incremental adaptation receives the benefits of starting from models trained using contaminated speech.


Proceedings ArticleDOI
13 May 2002
TL;DR: Improvements in word error rate are achieved on real microphone array tasks in a wide range of environments through the use of a new objective function which utilizes information from the recognition system itself, obtained in an unsupervised manner, to optimize the parameters of a filter-and-sum array processor.
Abstract: We present a new array processing algorithm for microphone array speech recognition. Conventionally, the goal of array processing is to take distorted signals captured by the array and generate a cleaner output waveform. However, speech recognition systems operate on a set of features derived from the waveform, rather than the waveform itself. The goal of an array processor used in conjunction with a recognition system is to generate a waveform which produces a set of recognition features which maximize die likelihood for the words that are spoken, rather than to minimize the waveform distortion. We propose a new array processing algorithm which maximizes the likelihood of the recognition features. This is accomplished through the use of a new objective function which utilizes information from the recognition system itself, obtained in an unsupervised manner, to optimize the parameters of a filter-and-sum array processor. Using the proposed method, improvements in word error rate of up to 36% over conventional methods are achieved on real microphone array tasks in a wide range of environments.

Patent
19 Dec 2002
TL;DR: In this article, a planar array of three or more microphones may be placed on a portable device, such as a handheld computer or a personal digital assistant, in conjunction with a signal processing circuit, defining a direction of sensitivity.
Abstract: Embodiments of the invention include a device and a method for translating words spoken in one language to a graphic or audible version of the words in a second language (Figure) A planar array of three or more microphones may be placed on a portable device, such as a handheld computer or a personal digital assistant The planar array, in conjunction with a signal processing circuit, defines a direction of sensitivity In a noisy environment, spoken words originating from the direction of sensitivity are selected and other sounds are rejected The spoken words are recognized and translated, and the translation is displayed on a display screen and/or issued via a speaker

Proceedings ArticleDOI
17 Jun 2002
TL;DR: In this paper, a test of the capability of phased microphone arrays for the investigation of wake vortices is presented, and it is shown that it is possible to estimate the frequency spectrum of wake-vortex noise, its spatial sound source distribution and even the trajectory of the two counter-rotating vortice.
Abstract: Wake vortices of landing aircraft emit a faint noise that is audible when wind speed is low and might be related to aerodynamic parameters of the vortices. Results from a test of the capability of phased microphone arrays for the investigation of wake vortices are presented. It is shown that it is possible to estimate the frequency spectrum of wake-vortex noise, its spatial sound source distribution and even the trajectory of the two counter-rotating vortices. The paper presents vortex-noise frequency spectra and sound source distributions of the wake vortices measured behind a Boeing 737, Boeing 757 and an Airbus A320. The phased microphone array technique is shown to be an appropriate tool for wake vortex investigations.

PatentDOI
Toshihiko Kataoka1
TL;DR: In this article, an attention direction of a robot, indicated by a face, eyes or the like thereof, can be aligned with a directivity direction of the microphone array, and voice recognition can be performed with an input of a delay sum corresponding to the attention direction.
Abstract: An attention direction of a robot, indicated by a face, eyes or the like thereof, can be aligned with a directivity direction of a microphone array. Specifically, an acoustic signal from a sound source can be captured, and input signals for individual microphones can be generated. A direction of the sound source can be estimated from the input signals. A visual line of the robot, a posture thereof, or both, can be controlled such that the attention direction of the robot coincides with the direction of the sound source. Then, the directivity direction of the microphone array can be aligned with the attention direction. Thereafter, voice recognition can be performed with an input of a delay sum corresponding to the directivity direction.