scispace - formally typeset
Search or ask a question

Showing papers on "Microphone array published in 2005"


Patent
23 Feb 2005
TL;DR: The generic beamforming as mentioned in this paper algorithm automatically designs a set of beams (i.e., beamforming) that cover a desired angular space range within a prescribed search area, which is a function of microphone geometry and operational characteristics, and also of noise models of the environment around the microphone array.
Abstract: The ability to combine multiple audio signals captured from the microphones in a microphone array is frequently used in beamforming systems. Typically, beamforming involves processing the output audio signals of the microphone array in such a way as to make the microphone array act as a highly directional microphone. In other words, beamforming provides a "listening beam" which points to a particular sound source while often filtering out other sounds. A "generic beamformer," as described herein automatically designs a set of beams (i.e., beamforming) that cover a desired angular space range within a prescribed search area. Beam design is a function of microphone geometry and operational characteristics, and also of noise models of the environment around the microphone array. One advantage of the generic beamformer is that it is applicable to any microphone array geometry and microphone type.

164 citations


Journal ArticleDOI
TL;DR: In this paper, a spherical microphone array configured around a rigid sphere is designed, analyzed using simulation, and then used experimentally to decompose the sound field in an anechoic chamber and an auditorium into waves.
Abstract: Directional sound-field information is becoming more important in sound-field analysis and auditorium acoustics, and, as a consequence, a variety of microphone arrays have recently been studied that provide such information. In particular, spherical microphone arrays have been proposed that provide three-dimensional information by decomposing the sound field into spherical harmonics. The theoretical formulation of the plane-wave decomposition and array performance analysis were also presented. In this paper, as a direct continuation of the recent work, a spherical microphone array configured around a rigid sphere is designed, analyzed using simulation, and then used experimentally to decompose the sound field in an anechoic chamber and an auditorium into waves. The array employs a maximum of 98 measurement positions around the sphere, and is used to compute spherical harmonics up to order 6. In the current paper we investigate the factors affecting the performance of plane-wave decomposition, showing that the direct sound and several reflections in an auditorium can be identified experimentally. This suggests that the microphone arrays studied here can be employed in various acoustic applications to identify the characteristics of reverberant sound fields.

135 citations


Journal ArticleDOI
TL;DR: Novel, frequency domain, approaches for TDOA calculation in a reverberant and noisy environment are presented based on the speech quasistationarity property, noise stationarity and on the fact that the speech and the noise are uncorrelated.

114 citations


Book ChapterDOI
11 Jul 2005
TL;DR: The AMI transcription system for speech in meetings developed in collaboration by five research groups includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data.
Abstract: The automatic processing of speech collected in conference style meetings has attracted considerable interest with several large scale projects devoted to this area. This paper describes the development of a baseline automatic speech transcription system for meetings in the context of the AMI (Augmented Multiparty Interaction) project. We present several techniques important to processing of this data and show the performance in terms of word error rates (WERs). An important aspect of transcription of this data is the necessary flexibility in terms of audio pre-processing. Real world systems have to deal with flexible input, for example by using microphone arrays or randomly placed microphones in a room. Automatic segmentation and microphone array processing techniques are described and the effect on WERs is discussed. The system and its components presented in this paper yield competitive performance and form a baseline for future research in this domain.

105 citations


Journal ArticleDOI
TL;DR: This work investigates interesting connections between BSS and ideal beamforming, which leads to a permutation alignment scheme based on microphone array directivity patterns and proposes a multistage algorithm, which aligns the unmixing filter permutations without sacrificing the spectral resolution.
Abstract: Acoustic reverberation severely limits the performance of multiple microphone blind speech separation (BSS) methods. We show that the limited performance is due to random permutations of the unmixing filters over frequency. This problem, which we refer to as permutation inconsistency, becomes worse as the length of the room impulse response increases. We explore interesting connections between BSS and ideal beamforming, which leads us to propose a permutation alignment scheme based on microphone array directivity patterns. Given that the permutations are properly aligned, we show that the blind speech separation method outperforms the nonblind beamformer in a highly reverberant environment. Furthermore, we discover the tradeoff where permutations can be aligned by affording a loss in spectral resolution of the unmixing filters. We then propose a multistage algorithm, which aligns the unmixing filter permutations without sacrificing the spectral resolution. For our study, we perform experiments in both real and simulated environments and compare the results to the ideal performance benchmarks that we derive using prior knowledge of the mixing filters.

102 citations


Journal ArticleDOI
TL;DR: It is found that many small features are required to make a useful location estimating algorithm work and work well in real-time, and the current LEMSalg is being used successfully in a representative environment where microphone SNRs are below 0 dB.
Abstract: A large array of microphones is being studied as a possible means of acquiring data in offices, conference rooms, and auditoria without requiring close-talking microphones. An array that surrounds all possible sources has a large aperture and such arrays have attractive properties for accurate spatial resolution and significant signal-to-noise enhancement. For the first time, this paper presents all the details of a real-time, source-location algorithm (LEMSalg) based on time-of-arrival delays derived from a phase transform applied to the generalized cross-power spectrum. It is being used successfully in a representative environment where microphone SNRs are below 0 dB. We have found that many small features are required to make a useful location estimating algorithm work and work well in real-time. We present an experimental evaluation of the current algorithm's performance using data taken with the Huge Microphone Array (HMA) system, which has 448 microphones in a noisy, reverberant environment. Using off-line computation, we also compared the LEMSalg to two alternative methods. The first of these adds local beamforming to the preprocessing of the base algorithm, increasing performance significantly at modest additional computational cost. The second algorithm maximizes the total steered-response power in the same phase transform. While able to derive good position estimates from shorter data runs, this method is two orders of magnitude more computationally expensive and is not yet suitable for real-time use.

92 citations


Journal ArticleDOI
TL;DR: An extension to the basic algorithm, called basis-point classical MDS (BCMDS), is presented, which handles the case when many of the distances are unavailable, thus yielding a technique that is practical for microphone arrays with a large number of microphones.
Abstract: Classical multidimensional scaling (MDS) is a global, noniterative technique for finding coordinates of points given their interpoint distances. We describe the algorithm and show how it yields a simple, inexpensive method for calibrating an array of microphones with a tape measure (or similar measuring device). We present an extension to the basic algorithm, called basis-point classical MDS (BCMDS), which handles the case when many of the distances are unavailable, thus yielding a technique that is practical for microphone arrays with a large number of microphones. We also show that BCMDS, when combined with a calibration target consisting of four synchronized sound sources, can be used for automatic calibration via time-delay estimation. We evaluate the accuracy of both classical MDS and BCMDS, investigating the sensitivity of the algorithms to noise and to the design parameters to yield insight as to the choice of those parameters. Our results validate the practical applicability of the algorithms, showing that errors on the order of 10-20 mm can be achieved in real scenarios.

91 citations


Journal Article
TL;DR: A theory and a system for capturing an audio scene and then rendering it remotely are developed and presented and Rigorous error bounds and Nyquist-like sampling criterion for the representation of the sound field are presented and verified.
Abstract: A theory and a system for capturing an audio scene and then rendering it remotely are developed and presented. The sound capture is performed with a spherical microphone array. The sound field at the location of the array is deduced from the captured sound and is represented using either spherical wave-functions or plane-wave expansions. The sound field representation is then transmitted to a remote location for immediate rendering or stored for later use. The sound renderer, coupled with the head tracker, reconstructs the acoustic field using individualized head-related transfer functions to preserve the perceptual spatial structure of the audio scene. Rigorous error bounds and Nyquist-like sampling criterion for the representation of the sound field are presented and verified.

90 citations


Proceedings ArticleDOI
18 Mar 2005
TL;DR: The various sensory modalities are processed both individually and jointly and it is shown that the multimodal approach results in significantly improved performance in spatial localization, identification and speech activity detection of the participants.
Abstract: Our long-term objective is to create smart room technologies that are aware of the users presence and their behavior and can become an active, but not an intrusive, part of the interaction. In this work, we present a multimodal approach for estimating and tracking the location and identity of the participants including the active speaker. Our smart room design contains three user-monitoring systems: four CCD cameras, an omnidirectional camera and a 16 channel microphone array. The various sensory modalities are processed both individually and jointly and it is shown that the multimodal approach results in significantly improved performance in spatial localization, identification and speech activity detection of the participants.

84 citations


PatentDOI
Stephane Dedieu1, Philippe Moquin1
TL;DR: In this article, a tilt sensor is used to determine the tilt angle of a speakerphone and the surface on which it rests, which can be used to adjust performance of any beamformer(s) where the speakerphone incorporates a microphone array or loudspeaker array.
Abstract: According to the present invention, a tilt sensor is used to determine the tilt angle of a speakerphone and the surface on which it rests. This information is used to optimize both the receive and transmit signals for the chosen tilt angle. The information can also be used to adjust performance of any beamformer(s) where the speakerphone incorporates a microphone array or loudspeaker array. In one embodiment, vibrational data is provided by the tilt sensor for enhancing the receive signal and acoustic echo cancellation.

83 citations


Proceedings ArticleDOI
18 Apr 2005
TL;DR: A system that gives a humanoid robot the ability to localize, separate and recognize simultaneous sound sources and an automatic speech recognizer based on the Missing Feature Theory that recognizes separated sounds in real-time by generating missing feature masks automatically from the post-filtering step.
Abstract: A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. While the first two are frequently addressed, the last one has not been studied so much. We present a system that gives a humanoid robot the ability to localize, separate and recognize simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. An automatic speech recognizer (ASR) based on the Missing Feature Theory (MFT) recognizes separated sounds in real-time by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. Recognition rates are presented for three simultaneous speakers located at 2m from the robot. Use of both the post-filter and the missing feature mask results in an average reduction in error rate of 42% (relative).

Patent
22 Dec 2005
TL;DR: In this paper, the first phase difference quantity is based on phase differences between non-repetitive pairs of input signals received by the first microphone and the second microphone, while the second phase difference magnitude is calculated by the phase difference between nonrepetive pairs of inputs received by both the first and second microphones.
Abstract: A noise reduction system and a method of noise reduction includes a microphone array comprising a first microphone, a second microphone, and a third microphone Each microphone has a known position and a known directivity pattern An instantaneous direction-of-arrival (IDOA) module determines a first phase difference quantity and a second phase difference quantity The first phase difference quantity is based on phase differences between non-repetitive pairs of input signals received by the first microphone and the second microphone, while the second phase difference quantity is based on phase differences between non-repetitive pairs of input signals received by the first microphone and the third microphone A spatial noise reduction module computes an estimate of a desired signal based on a priori spatial signal-to-noise ratio and an a posteriori spatial signal-to-noise ratio based on the first and second phase difference quantities

Book ChapterDOI
01 Jan 2005
TL;DR: A more robust technique called the spatially pre-processed speech distortion weighted multichannel Wiener filter (SPSDW-MWF), which takes speech distortion due to signal model errors explicitly into account in its design criterion, and which encompasses the standard GSC as a special case is discussed.
Abstract: In many speech communication applications a microphone array is available nowadays, such that multi-microphone speech enhancement techniques can be used instead of single-microphone speech enhancement techniques. A wellknown multi-microphone speech enhancement technique is the generalized sidelobe canceller (GSC), which is however quite sensitive to signal model errors, such as microphone mismatch. This chapter discusses a more robust technique called the spatially pre-processed speech distortion weighted multichannel Wiener filter (SPSDW-MWF), which takes speech distortion due to signal model errors explicitly into account in its design criterion, and which encompasses the standard GSC as a special case. In addition, a novel frequency-domain criterion for the SDW-MWF is presented, from which several — existing and novel — adaptive frequency-domain algorithms can be derived for implementing the SDW-MWF. The noise reduction performance and the robustness of these adaptive algorithms is investigated for a hearing aid application. Using experimental results with a small-sized microphone array, it is shown that the SP-SDW-MWF is more robust against signal model errors than the GSC, both in stationary and in changing noise scenarios.

Journal ArticleDOI
TL;DR: In this paper, a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech, is presented, including permutation, scaling, circularity, and complex activation function solutions.
Abstract: This paper overviews a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech. Frequency-domain BSS performs independent component analysis (ICA) in each frequency bin, and this is more efficient than time-domain BSS. We describe a sophisticated total solution for frequency-domain BSS, including permutation, scaling, circularity, and complex activation function solutions. Experimental results of 2 × 2, 3 × 3, 4 × 4, 6 × 8, and 2 × 2 (moving sources), (#sources × #microphones) in a room are promising.

Patent
17 Mar 2005
TL;DR: In this article, a system for detecting noise in a signal received by a microphone array and a method for detecting the presence of noise in the received signal was presented, and the system also provided for the reduction of the signal to noise ratio.
Abstract: A system for detecting noise in a signal received by a microphone array and a method for detecting noise in a signal received by a microphone array is disclosed. The system also provides for the reduction of noise in a signal received by a microphone array and a method for reducing noise in a signal received by a microphone array. The signal to noise ratio in handsfree systems may be improved, particularly in handsfree systems present in a vehicular environment.

Journal ArticleDOI
TL;DR: Experimental results using both small and large-aperture microphone array systems show the acoustically-determined positions to be more consistent than the ones measured directly, and these results can be enhanced by suitable compensation.
Abstract: Large-aperture microphone arrays have the potential for providing quality speech acquisition from multiple talkers over a large focal area for such applications as teleconferencing and speech-recognition input. The cost of computing is rapidly approaching the point at which these arrays will be practical in many common environments. However, an important issue is the calibration of an array. We discuss procedures to accurately determine both the coordinates in three-dimensions for the position of each microphone and the individual gains of each microphone/microphone channel. For the positions, we found that calibration by direct measurement, using a surveyor's transit, is simply too error prone, time consuming, and difficult for large arrays and is too inaccurate for small arrays. We have also seen that without the careful matching of inexpensive electret microphones, the channel sensitivities vary by as much as 6 dB and performance can be enhanced by suitable compensation. This paper describes new apparatus and techniques for automatic calibration using acoustic signals. Experimental results using both small and large-aperture microphone array systems show the acoustically-determined positions to be more consistent than the ones measured directly. Gain measurements are somewhat more difficult but gains can be found within 1-2 dB with reasonable care.

Proceedings ArticleDOI
05 Dec 2005
TL;DR: A three ring microphone array estimates the horizontal/vertical direction and distance of sound sources and separates multiple sound sources for mobile robot audition and can separate 3 different pressure speech sources without drowning out.
Abstract: This paper describes a three ring microphone array estimates the horizontal/vertical direction and distance of sound sources and separates multiple sound sources for mobile robot audition. Arrangement of microphones is simulated and an optimized pattern which has three rings is implemented with 32 microphones. Sound localization and separation are achieved by delay and sum beam forming (DSBF) and frequency band selection (FBS). From on-line experiments results of sound horizontal and vertical localization, we confirmed that one or two sounds sources could be localized with an error of about 5 degrees and 200 to 300 mm in the case of the distance of about lm. The off-line experiments of sound separation were evaluated by power spectrums in each frequency of separated sounds, and we confirmed that an appropriate frequency band could be selected by DSBF and FBS. The system can separate 3 different pressure speech sources without drowning out.

Journal ArticleDOI
TL;DR: In this article, experimental results on propagation, coherence, and time-delay estimation from a microphone array in an outdoor aero-acoustic environment were presented, and the achievable accuracy of acoustic TDE using low-cost, commercial off-the-shelf (COTS) speakers and microphones was analyzed.
Abstract: Experimental results are presented on propagation, coherence, and time-delay estimation (TDE) from a microphone array in an outdoor aeroacoustic environment. The primary goal is to understand the achievable accuracy of acoustic TDE using low-cost, commercial off-the-shelf (COTS) speakers and microphones. In addition, through the use of modulated pseudo-noise sequences, the experiment seeks to provide an empirical understanding of the effects of center frequency, bandwidth, and signal duration on TDE effectiveness and compares this to the theoretical expectations established by the Weiss-Weinstein lower bound. Finally, sensor network self-localization is performed using a maximum likelihood estimator and the time-delay estimates. Experimental network localization error is presented as a function of the acoustic calibration signal parameters.

Proceedings ArticleDOI
18 Mar 2005
TL;DR: A generic beamformer design algorithm that makes efficient use of noise models for ambient and instrumental and microphone directivity patterns and replaces a multi-dimensional optimization with a much simpler one-dimensional search, which can compute near-optimal solutions in reasonable time.
Abstract: This paper presents a generic beamformer design algorithm for arbitrary microphone array geometry. It makes efficient use of noise models for ambient and instrumental and microphone directivity patterns. By using a new definition of the target criterion and replacing a multi-dimensional optimization with a much simpler one-dimensional search, we can compute near-optimal solutions in reasonable time. The designed beams achieve noise suppression levels between 10 and 15 dB, for microphone arrays with four to eight elements, and linear and circular geometries. The fast beamformer real-time processing engine consumes less than 2% of the CPU power of a modern personal computer, for a four-microphone array.

01 Sep 2005
TL;DR: This paper presents a hybrid microphone array architecture used for processing the signals from a small microphone array that is used in a headset that is computationally effective and provides up to 18 dB of ambient noise suppression.
Abstract: This paper presents a hybrid microphone array architecture used for processing the signals from a small microphone array that is used in a headset. The processing chain consists of fixed end-fire beamforming, adaptive spatial noise reduction and stationary noise suppression. The far-field design algorithm used for the fixed beamformer is adapted to the specifics of the headset by compensating for the directivity of the mouth and the sound diffraction around the head. The spatial noise reduction generalizes the suppression rule for optimal MMSE power noise reduction to multiple dimensions. The algorithm was tested with a headset that used a three element microphone array. It is computationally effective and provides up to 18 dB of ambient noise suppression.

Proceedings ArticleDOI
05 Dec 2005
TL;DR: This paper introduces a 64 ch microphone array system in a room and proposes a new method based on weighted delay-and-sum beamforming to estimate a directivity pattern of a sound source and proves the effectiveness of the microphone array through sound source tracking with orientation and detection of actual human voices based on directivity patterns.
Abstract: In human-robot communication, a robot should distinguish between voices uttered by a human and those played by a loudspeaker such as on a TV or a radio. This paper addresses detection of actual human voices by using a microphone array as an extension of auditory function of the robot to support environmental understanding by the robot. We introduce a 64 ch microphone array system in a room and propose a new method based on weighted delay-and-sum beamforming to estimate a directivity pattern of a sound source. The microphone array system localizes a sound source and estimates its directivity pattern. The directivity pattern estimation has two advantages as follows: One is that the system can detect whether the sound source is an actual human voice or not by comparing the estimated directivity pattern with prerecorded directivity patterns. The other is that the heading of the sound source is estimated by detecting the angle with the highest power in the directivity pattern. As a result, we proved the effectiveness of our microphone array through sound source tracking with orientation and detection of actual human voices based on directivity pattern estimation.

Proceedings ArticleDOI
21 Nov 2005
TL;DR: In this article, a hemispherical microphone array for spatial sound acquisition and beamforming is designed and demonstrated for a half 3D acoustic environment where all sound sources are constrained on one side of a rigid plane.
Abstract: We design and demonstrate a hemispherical microphone array for spatial sound acquisition and beamforming. Our design makes use of the acoustic image principle. It is especially appropriate for a half 3D acoustic environment where all sound sources are constrained on one side of a rigid plane. It avoids the difficulties of building a full spherical microphone array yet keeps the advantage of achieving a direction-invariant beampattern. A special microphone layout is designed for simple implementation. We also propose an approach to effectively calibrate data-independent coefficients of the system. Simulation and experimental results are presented

Patent
21 Feb 2005
TL;DR: In this paper, the authors proposed a method for detecting noise in a signal received by a microphone array, comprising the steps of receiving microphone signals emanating from at least two microphones, decomposing each microphone signal into frequency subband signals, determining a time dependent measure based on the frequency sub-band signals and evaluating the criterion function according to the predetermined criterion to detect noise.
Abstract: The invention is directed to a method for detecting noise in a signal received by a microphone array, comprising the steps of receiving microphone signals emanating from at least two microphones of a microphone array, decomposing each microphone signal into frequency subband signals, for each microphone signal, determining a time dependent measure based on the frequency sub-band signals, determining a time dependent criterion function as predetermined statistical function of the time dependent measures, and evaluating the criterion function according to the predetermined criterion to detect noise.

Journal ArticleDOI
TL;DR: In this article, the authors tested the ability of a computer-based passive acoustic location system (ALS) to determine the two-dimensional locations of vocalizing animals using multi-track tape recordings.
Abstract: In this study, we tested the ability of a computer-based passive acoustic location system (ALS) to determine the two-dimensional locations of vocalizing animals. The ALS uses multi-track tape recordings to estimate locations based on arrival time delays between widely-spaced microphones. We tested the accuracy of ALS location estimates using tape recordings of wild free-ranging birds made with 4 microphones placed at the corners of a 40m square. We compared ALS location estimates for these birds with locations determined by surveying the locations of the perches birds vocalized from. ALS location estimates were typically less than lm away from surveyed locations when birds vocalized within the square microphone array, rising to just over 2m for birds vocalizing from within 25m beyond the array boundary. Beyond 25m, accuracy diminished rapidly with increasing distance. ALS accuracy did not depend on the bird species located, but location estimates based on frequency-modulated tonal notes were more...

01 Mar 2005
TL;DR: A dereverberation algorithm for improving automatic speech recognition (ASR) results with minimal CPU overhead is presented, as the reverberation tail hurts ASR the most, late reverberation is reduced via gain-based spectral subtraction.
Abstract: In this paper we present a dereverberation algorithm for improving automatic speech recognition (ASR) results with minimal CPU overhead. As the reverberation tail hurts ASR the most, late reverberation is reduced via gain-based spectral subtraction. We use a multi-band decay model with an efficient method to update it in realtime. In reverberant environments the multi-channel version of the proposed algorithm reduces word error rates (WER) up to one half of the way between those of a microphone array only and a close-talk microphone. The four channel implementation requires less than 2% of the CPU power of a modern computer. Introduction The need to present clean sound inputs to today's speech recognition engines has fostered huge amounts of research into areas of noise suppression, microphone array processing, acoustic echo cancellation and methods for reducing the effects of acoustic reverberation. Reducing reverberation through deconvolution (inverse filtering) is one of the most common approaches. The main problem is that the channel must be known or very well estimated for successful deconvolution. The estimation is done in the cepstral domain [1] or on envelope levels [2]. Multi-channel variants use the redundancy of the channel signals [3] and frequently work in the cepstral domain [4]. Blind dereverberation methods seek to estimate the input(s) to the system without explicitly computing a deconvolution or inverse filter. Most of them employ probabilistic and statistically based models [5]. Dereverberation via suppression and enhancement is similar to noise suppression. These algorithms either try to suppress the reverberation, enhance the direct-path speech, or both. There is no channel estimation and there is no signal estimation, either. Usual techniques are longterm cepstral mean subtraction [6], pitch enhancement [7], LPC analysis [8] in single or multi-channel implementation. The most common issues with the preceding methods are slow reaction when reverberation changes, robustness to noise, and computational requirements. Modeling and assumptions We convoluted clean speech signal with a typical room response function and processed it trough our ASR engine, cutting the length of the response function after some point. The results are shown on Figure 1. The early reverberation practically has no effect on the ASR results, most probably due to cepstral mean subtraction (CMS) in the ASR engine front end. The CMS compensates for the constant part of the input channel response and removes the early reverberation. The reverberation has noticeable effect on WER between 50 ms and RT30. In this time interval the reverberation behaves more as non-stationary, uncorrelated decaying noise ) ( f R : ) ( ) ( ) ( f f X f Y R + = (1) We assume that the reverberation energy in this time interval decays exponentially and is the same in every point of the room (i.e. it is diffuse). Our decay model is frequency dependent:

Patent
Juin-Hwey Chen1
24 May 2005
TL;DR: In this article, the authors proposed a method of processing audio signals from a wireless telephone having an array of microphones and a digital signal processor (DSP), which can detect a direction of arrival (DOA) of a sound wave emanating from the mouth of a user based on the audio signals.
Abstract: A wireless telephone having an array of microphones and a digital signal processor (DSP) and a method of processing audio signals from a wireless telephone having an array of microphones and a DSP. The wireless telephone includes an array of microphones and a DSP. Each microphone in the array is configured to receive sound waves emanating from the surrounding environment and to generate an audio signal corresponding thereto. The DSP is coupled to the array of microphones. The DSP is configured to receive the audio signals from the array of microphones, to detect a direction of arrival (DOA) of a sound wave emanating from the mouth of a user based on the audio signals and to adaptively combine the audio signals based on the DOA to produce a first audio output signal.

Patent
Juin-Hwey Chen1
30 Sep 2005
TL;DR: In this article, a telephone equipped with multiple microphones that provides improved performance during operation of the telephone in a speakerphone mode is presented, where the multiple microphones can be used to improve voice activity detection, which in turn, can improve echo cancellation.
Abstract: The present invention is directed to a telephone equipped with multiple microphones that provides improved performance during operation of the telephone in a speaker-phone mode. For example, the multiple microphones can be used to improve voice activity detection, which in turn, can improve echo cancellation. In addition, the multiple microphones can be configured as an adaptive microphone array and used to reduce the effects of (i) room reverberation, when a near-end user is speaking, and/or (ii) acoustic echo, when a far-end user is speaking.

Proceedings ArticleDOI
18 Mar 2005
TL;DR: A novel method is presented that calculates a smaller number of test points using an efficient closed-form localization algorithm, which significantly reduces the number of calculations, while still remaining robust in acoustical environments.
Abstract: The location of an acoustical source can be found robustly using the steered response pattern-phase transform (SRP-PHAT) algorithm. However SRP-PHAT can be computationally expensive, requiring a search of a large number of candidate locations. The required spacing between these locations is dependent on sampling rate, microphone array geometry, and source location. In this work, a novel method is presented that calculates a smaller number of test points using an efficient closed-form localization algorithm. This method significantly reduces the number of calculations, while still remaining robust in acoustical environments.

Patent
29 Apr 2005
TL;DR: In this article, a system automatically determines an equalizing filter characteristic for a communication system within a vehicle, which includes a loudspeaker and a microphone or microphone array, based on a predetermined test signal and the received test signal.
Abstract: A system automatically determines an equalizing filter characteristic for a communication system within a vehicle. The communication system includes a loudspeaker and a microphone or microphone array. The system transmits a predetermined test signal through the loudspeaker and receive the test signal through the microphone or microphone array. Based on the predetermined test signal and the received test signal, a transfer function is developed. The equalizing filter characteristic is then developed from the transfer function.

Patent
09 Aug 2005
TL;DR: In this paper, a hand-held remote control for home and office appliances, such as TV, projector, DVD/CD, VCR, sound system, and many others, is presented.
Abstract: The present invention is to provide a voice-operated handhold remote control to be used with home and office appliances, such as TV, projector, DVD/CD, VCR, sound system, and many others. A user can use voice commands through a remote control of this invention to execute control functions over the appliances. To reach the object, the remote control of the present invention comprises at least: (1) a button for both muting and push-to-talk; (2) a microphone or microphone array; (3) an automatic speech recognizer; (4) a digital signal microprocessor; (5) memory; and (6) a signal transmitter.