scispace - formally typeset
Search or ask a question

Showing papers on "Microphone array published in 2000"


Patent
19 Oct 2000
TL;DR: In this article, a natural language interface control system for operating a plurality of devices (114) consisting of a first microphone array (108), a feature extraction module (202) coupled to the first microphone arrays, and a speech recognition module (204) coupled with the feature extraction modules, wherein the speech recognition model utilizes hidden Markov models.
Abstract: A natural language interface control system (206) for operating a plurality of devices (114) consists of a first microphone array (108), a feature extraction module (202) coupled to the first microphone array, and a speech recognition module (204) coupled to the feature extraction module, wherein the speech recognition module utilizes hidden Markov models. The system also comprises a natural language interface module (222) coupled to the speech recognition module (204) and a device interface (210) coupled to the natural language interface module (222), wherein the natural language interface module is for operating a plurality of devices coupled to the device interface based upon non-prompted, open-ended natural language requests from a user.

342 citations


BookDOI
Jacob Benesty1
01 Mar 2000
TL;DR: This chapter discusses multi-Channel Sound, Acoustic Echo Cancellation, and Multi-Channel Time-Domain Adaptive Filtering, and an Introduction to Blind Source Separation of Speech Signals.
Abstract: List of Figures. List of Tables. Preface. Contributing Authors. 1. An Introduction to Acoustic Echo and Noise Control S.L. Gay, J. Benesty. Part I: Mono-Channel Acoustic Echo Cancellation. 2. The Fast Affine Projection Algorithm S.L. Gay. 3. Subband Acoustic Echo Cancellation Using the FAP-RLS Algorithm: Fixed-Point Implementation Issues M. Ghanassi, B. Champagne. 4. Real-Time Implementation of the Exact Block NLMS Algorithm for Acoustic Echo Control in Hands-Free Telephone Systems B.H. Nitsch. 5. Double-Talk Detection Schemes for Acoustic Echo Cancellation T. Gansler, J. Benesty, S.L. Gay. Part II: Multi-Channel Acoustic Echo Cancellation. 6. Multi-Channel Sound, Acoustic Echo Cancellation, and Multi-Channel Time-Domain Adaptive Filtering J. Benesty, T. Gansler, P. Eneroth. 7. Multi-Channel Frequency-Domain Adaptive Filtering J. Benesty, D.R. Morgan. 8. A Real-time Stereophonic Acoustic Subband Echo Canceler P. Eneroth, S.L. Gay, T. Gansler, J. Benesty. Part III: Noise Reduction Techniques with a Single Microphone. 9. Subband Noise Reduction Methods for Speech Enhancement E.J. Diethorn. Part IV: Microphone Arrays. 10. Superdirectional Microphone Arrays G.W. Elko. 11. Microphone Arrays for Video Camera Steering Yiteng Huang, J. Benesty, G.W. Elko. 12. Nonlinear, Model-Based Microphone Array Speech Enhancement M.S. Brandstein, S.M. Griebel. Part V: Virtual Sound. 13. 3D Audio and Virtual Acoustical Environment Synthesis Jiashu Chen.14. Virtual Sound Using Loudspeakers: Robust Acoustic Crosstalk Cancellation D.B. Ward, G.W. Elko. Part VI: Blind Source Separation. 15. An Introduction to Blind Source Separation of Speech Signals J. Benesty. Index.

315 citations


Proceedings Article
01 May 2000
TL;DR: LREC2000: the 2nd International Conference on Language Resources and Evaluation, May 31 - June 2, 2000, Athens, Greece.
Abstract: LREC2000: the 2nd International Conference on Language Resources and Evaluation, May 31 - June 2, 2000, Athens, Greece.

259 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: This paper describes a new blind signal separation method using the directivity patterns of a microphone array that improves the SNR of degraded speech by about 16 dB under non-reverberant condition and improves theSNR by 8.7 dB when the reverberation time is 184 ms.
Abstract: This paper describes a new blind signal separation method using the directivity patterns of a microphone array. In this method, to deal with the arriving lags among each microphone, the inverses of the mixing matrices are calculated in the frequency domain so that the separated signals are mutually independent. Since the calculations are carried out in each frequency independently, the following problems arise: (1) permutation of each sound source, (2) arbitrariness of each source gain. In this paper, we propose a new solution that directivity patterns are explicitly used to estimate each sound source direction. As the results of signal separation experiments, it is shown that the proposed method improves the SNR of degraded speech by about 16 dB under non-reverberant condition. Also, the proposed method improves the SNR by 8.7 dB when the reverberation time is 184 ms, and by 5.1 dB when the reverberation time is 322 ms.

212 citations


PatentDOI
TL;DR: In this article, the authors proposed a bidirectional network based on an IEEE 1394 bus for voice control in an individual consumer electronics appliance, where the distance between the individual microphones is limited on account of the dimensions of the appliance.
Abstract: Voice control systems are used in diverse technical fields. In this case, the spoken words are detected by one or more microphones and then fed to a speech recognition system. In order to enable voice control even from a relatively great distance, the voice signal must be separated from interfering background signals. This can be effected by spatial separation using microphone arrays comprising two or more microphones. In this case, it is advantageous for the individual microphones of the microphone array to be distributed spatially over the greatest possible distance. In an individual consumer electronics appliance, however, the distances between the individual microphones are limited on account of the dimensions of the appliance. Therefore, the voice control system according to the invention comprises a microphone array having a plurality of microphones which are distributed between different appliances, in which case the signals generated by the microphones can be transmitted to the central speech recognition unit, advantageously via a bidirectional network based on an IEEE 1394 bus.

133 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: This paper proposes a new method which suppresses the undesired cross-correlation by synchronous addition of CSP coefficients derived from multiple microphone pairs and shows that the proposed method improves the localization accuracy when increasing the number of the synchronous additions.
Abstract: Accurate localization of multiple sound sources is indispensable for the microphone array-based high quality sound capture. For single sound source localization, the CSP (cross-power spectrum phase analysis) method has been proposed. The CSP method localizes a sound source as a crossing point of sound directions estimated using different microphone pairs. However, when localizing multiple sound sources, the CSP method has a problem that the localization accuracy is degraded due to cross-correlation among different sound sources. To solve this problem, this paper proposes a new method which suppresses the undesired cross-correlation by synchronous addition of CSP coefficients derived from multiple microphone pairs. Experiment results in a real room showed that the proposed method improves the localization accuracy when increasing the number of the synchronous addition.

105 citations


Journal ArticleDOI
TL;DR: It is shown that the optimum weights obtained for microphone spacing equal to a half wavelength are real, aside from the propagation delay, and inversely proportional to the distance from each microphone to the focal point.
Abstract: This paper describes the application of array optimization techniques in the near-field of a microphone array. It is shown that the optimum weights obtained for microphone spacing equal to a half wavelength are real, aside from the propagation delay, and inversely proportional to the distance from each microphone to the focal point. When the microphone spacing is less than a half wavelength, the optimum weights are super-directive requiring both an amplitude and phase adjustment in addition to delay-and-sum beamforming. In the super directive regime, the maximum array gain increases as the distance to the focal points decreases. The technique is suitable for applications where the source of interest is located in the near-field region but interfering sources are located farther from the array.

72 citations


Journal ArticleDOI
TL;DR: This article shows how to mitigate the effect of interference reduction when acquiring speech from a near-field talker in the presence of a strong source of interference located farther from the array by selection of the constrained response level.
Abstract: This article addresses the problem of maximizing the near-field gain of a microphone array subject to a constraint on the far-field beampattern. The problem arises when acquiring speech from a near-field talker in the presence of a strong source of interference located farther from the array. When the angles of incidence from the near-field target and the far-field interference are identical, enforcing a null constraint in the interference direction reduces array gain and robustness. This article shows how to mitigate this effect by selection of the constrained response level. A suitable selection is to force the beampattern in the interference direction to be proportional to the unconstrained beampattern. The proportionality constant can then be used to trade off interference reduction and array gain. Specific numerical examples are provided.

63 citations


Proceedings ArticleDOI
01 Jun 2000
TL;DR: The array is shown to have a marked effect on the recognition results in a high noise office environment, particularly where there is a high level of undesired speech.
Abstract: This paper assesses the effectiveness of a microphone array system in providing robust hands-free speech recognition on a computer workstation. Near-field superdirectivity (NFSD) is used for the beamforming, and the effect of adding a post-filter is investigated. In the experiments, the speaker is located directly in front of the computer monitor at a distance of 60 cm and the array is designed to fit across the top of a standard 17 inch monitor. To test the proposed system, connected digit strings from the TIDIGITS database are used. The array is shown to have a marked effect on the recognition results in a high noise office environment, particularly where there is a high level of undesired speech.

58 citations


PatentDOI
TL;DR: In this article, a system for suppressing unwanted signals in steerable microphone arrays is proposed, where the lobes of steerable microphones are monitored to identify lobes having large speech content and low noise content.
Abstract: A system for suppressing unwanted signals in steerable microphone arrays. The lobes of a steerable microphone array are monitored, to identify lobes having large speech content and low noise content. One of the identified lobes is then used to deliver speech to a speech recognition system, as at a self-service kiosk.

54 citations



Journal ArticleDOI
S. Brühl1, A. Röder1
TL;DR: In this paper, the authors presented a method which allows a construction of an acoustic source model based on the analysis of microphone array measurements during a train pass-by, where the conventional array beamforming technique is used as a kind of pre-processing or first order analysis to calculate in a second step the true source strengths by a back-projection method.

PatentDOI
Koichiro Mizushima1
TL;DR: In this paper, a method and apparatus enabling information including respective angular directions to be obtained for one or more sound sources includes a sound source direction estimation section for frequency-domain and time-domain processing of sets of output signals from a microphone array to derive successive estimated angular directions of each of the sound sources.
Abstract: A method and apparatus enabling information including respective angular directions to be obtained for one or more sound sources includes a sound source direction estimation section for frequency-domain and time-domain processing of sets of output signals from a microphone array to derive successive estimated angular directions of each of the sound sources. The estimated directions can be utilized by a passage detection section to detect when a sound source is currently moving past the microphone array and the direction of the sound source at the time point when such passage detection is achieved, and a motion velocity detection section which is triggered by such passage detection to calculate the velocity of the passing sound source by using successively obtained estimated directions. In addition it becomes possible to produce directivity of the microphone array, oriented along the direction of a sound source which is moving past the microphone array, enabling accurate monitoring of sound levels of respective sound sources.

Journal ArticleDOI
TL;DR: An efficient parameterization for the nearfield broadband beamforming problem with a single parameter to focus the beamformer to a desired operating radius and another set of parameters to control the actual broadband beampattern shape is introduced.
Abstract: This paper introduces an efficient parameterization for the nearfield broadband beamforming problem with a single parameter to focus the beamformer to a desired operating radius and another set of parameters to control the actual broadband beampattern shape. The parameterization is based on an orthogonal basis set of elementary beampatterns by which an arbitrary beampattern can be constructed. A set of elementary beamformers are then designed for each elementary beampattern and the desired beamformer is constructed by summing the elementary beamformers with frequency and source-array distance dependent weights. An important consequence of our result is that the beamformer can be factored into three levels of filtering: (i) beampattern independent elementary beamformers; (ii) beampattern shape dependent filters; and (iii) radial focusing filters where a single parameter can be adjusted to focus the array to a desired radial distance from the array origin. As an illustration the method is applied to the problem of producing a practical array design that achieves a frequency invariant beampattern over the frequency range of 1:10 (which is suitable for speech acquisition using a microphone array), and with the array focused either to farfield or nearfield where at the lowest frequency the radial distance to the source is only three wavelengths.

Journal ArticleDOI
TL;DR: This work proposes a unified neural-network-based source localization technique, which is simultaneously applicable to wide-band and narrow-band signal sources that are in the far field or near field of a microphone array and exploits a multilayer perceptron feedforward neural network structure.
Abstract: Locating and tracking a speaker in real time using microphone arrays is important in many applications such as hands-free video conferencing, speech processing in large rooms, and acoustic echo cancellation. A speaker can be moving from the far field to the near field of the array, or vice versa. Many neural-network-based localization techniques exist, but they are applicable to either far-field or near-field sources, and are computationally intensive for real-time speaker localization applications because of the wide-band nature of the speech. We propose a unified neural-network-based source localization technique, which is simultaneously applicable to wide-band and narrow-band signal sources that are in the far field or near field of a microphone array. The technique exploits a multilayer perceptron feedforward neural network structure and forms the feature vectors by computing the normalized instantaneous cross-power spectrum samples between adjacent pairs of sensors. Simulation results indicate that our technique is able to locate a source with an absolute error of less than 3.5/spl deg/ at a signal-to-noise ratio of 20 dB and a sampling rate of 8000 Hz at each sensor.

Proceedings ArticleDOI
30 Jul 2000
TL;DR: An automatic video-conferencing system is proposed which employs acoustic source localization, video face tracking and pose estimation, and multi-channel speech enhancement.
Abstract: An automatic video-conferencing system is proposed which employs acoustic source localization, video face tracking and pose estimation, and multi-channel speech enhancement. The video portion of the system tracks talkers by utilizing source motion, contour geometry, color data and simple facial features. Decisions involving which camera to use are based on an estimate of the head's gazing angle. This head pose estimation is achieved using a very general head model which employs hairline features and a learned network classification procedure. Finally, a wavelet microphone array technique is used to create an enhanced speech waveform to accompany the recorded video signal. The system presented in this paper is robust to both visual clutter (e.g. ovals in the scene of interest which are not faces) and audible noise (e.g. reverberations and background noise).

Patent
30 Nov 2000
TL;DR: In this article, a wide frequency band micromachined microphone including a plurality of micro-achined cells of the type including electrodes carried by a membrane supported above a common electrode with conductive lines interconnecting the electrodes is described.
Abstract: A wide frequency band micromachined microphone including a plurality of micromachined cells of the type including electrodes carried by a membrane supported above a common electrode with conductive lines interconnecting the electrodes is described. A method of operating a microphone array is also described.

Journal Article
TL;DR: In this paper, the authors discuss the options available to the sound engineer in the choice of segment coverage by the Front Triplet and Back Pair, and how Critical Linking can be obtained in relation to the Lateral Segments.
Abstract: The design of a microphone array for multichannel sound recording involves the manipulation of many interrelated parameters : Segment Coverage, Electronic Time and Intensity Offset, Microphone Position Offset. This paper discusses the options available to the sound engineer in the choice of segment coverage by the Front Triplet and Back Pair, and how Critical Linking can be obtained in relation to the Lateral Segments. A basic design procedure is illustrated that will enable the sound recording engineer to design the microphone array neededfor a specific reproduction configuration. Introduction The basic parameters of a Multichannel Microphone Array were presented at the 107th AES Convention in New York in a paper (2) entitled Microphone Array Analysis for Multichannel Sound Recording (preprint 4997). Parameters such as Segment Coverage, Electronic Time and Intensity Offset, Microphone Position Offset were defined, and their influence on the design of a microphone array was discussed. However the design of a specific microphone array to meet the needs of a particular sound recording environment was beyond the scope of this first paper. This present paper tries to meet to some extent the needs of an individual sound engineer to be able to design a microphone array taking into account the particular musical and acoustic environment for a specific recording. Each stage in the design procedure is discussed : Front Triplet Coverage The reproduction of this front sound stage is probably the Lost important stage in the design of a specific Multichannel Microphone Array (MMA) design. In general the direct sound from most sound sources encountered will be covered by this front triplet, however the configuration of the three microphones will also condition considerably the difficulties encountered in obtaining critical linking with the lateral and back segments. l Back Pair Coverage This is usually dependant on the musical sound source configuration. Is the sound source completely surrounding the microphone array or is the disposition more traditional? In the second case however, the coverage should be chosen in relation to the particular acoustic environment, the artistic choice of the sound engineer will also influence the type of envelopment that he considers necessary in the back reproduction segment. AES 108th CONVENTION, PARIS, 2000 FEBRUARY 19-22 WILLIAMS AND LE DU PREPRINT 5157 MULTICHANNEL MICROPHONE ARRAY DESIGN better than the traditional stereophonic reproduction. However the front perception zone of reproduction of about 60” seems to satisfy approximately our needs as to the angular size of the main sound stage, but with multichannel continuous sound field recording and reproduction, we have the freedom to be able to widen this sound stage if we feel the need. A multichannel surround sound reproduction system is also capable of reproducing realistically the multitude of early reflections and the surrounding reverberation, which are mainly responsible for the feeling of (( space D in a good recording. So great care must be taken in the positioning of the sound source and the microphone array, in relation to the acoustic environment, to exploit this effect to a maximum. The approach is obviously different in the the case of a completely surrounding sound source, either where one needs to record the natural sound environment of, say, a forest, or in the case of a certain number of musical works written specifically with a view to creating a surround sound environment. The microphone system is then, by definition, placed in the middle of the surrounding sound source. In this type of situation the smooth reproduction of the sound field means that each sound reproduction segment covered by a pair of loudspeakers must correspond exactly to the same segment of the original sound field. But the realistic reproduction of sound perspective depends entirely on our ability to place each sound source at the required distance from the microphone system. This means unfortunately that we must accept as inevitable the distortion of perspective in the recording of the natural outdoor sound environments fortunately the perception of the exact distance of this type of sound source is rarely critical. Front Triplet Design The choice of the Front Triplet Coverage Angle is determined basically by the position of the microphone system, and the angular width of the sound source as “seen” by the microphone array much the same as with a two channel stereophonic microphone array. However we have a degree more liberty in the choice of the coverage angle than was the case in stereo. The coverage angle can either be within the angle occupied by the sound source, in which case it is the lateral segments that will reproduce the extremities of the sound source, or as with stereo, the coverage angle can be greater than the angle covered by the sound source the reproduction of the direct sound from the source then being within the two front segments. The limit case in which the coverage angle is equal to the sound source angle is of course also possible but does not suffer from that claustrophobic impression often given in stereophonic recordings when not enough attention has been paid to producing a certain quantity of (( side-room D. Side-room can be considered as the “space” left in between the extremities of the reproded main sound stage and the limits to the possible stereophonic image created by the position of the loudspeakers. This is very similar to the headroom that is necessary for good balance in a picture. AES 108th CONVENTION, PARIS, 2000 FEBRUARY 19-22 3 WILLIAMS AND LE DU PREPRINT 5157 MULTICHANNEL MICROPHONE ARRAY DESIGN The abrupt limit to the sound field of stereophonic reproduction is not perceived in a multichannel reproduction system due to the continuation of sound field reproduction, obtained on condition that the lateral segments are Critically Linked to the Front Triplet Coverage. The (( side-room D in a multichannel system is no longer necessary as we have all the room that we need ! We have therefore three choices possible for the the Front Coverage Angle : a) Front Coverage Angle > Angle of the Sound Source in which case the sound source will be reproduced within the sound stage generated by the front three loudspeakers b) Front Coverage Angle = Angle of the Sound Source in this case the sound source will fill the whole of the front sound stage c) Front Coverage Angle < Angle of the Sound Source in this case we will need the lateral segments to continue the reproduction of the sound source in the left and right lateral segments, generated by the left front & left back loudspeakers, and right front & right back loudspeakers respectively. Once we have determined which choice to make for the Front Coverage Angle and knowing the position of the microphone system we will be able to consider the actual combinations of distance and angle between the microphones that we need to adopt. We will consider two specific examples of Coverage Angle. Table 1 : where the Front Triplet has a total Coverage Angle of 120°. This means that both the Left Front Pair and the Right Front Pair must cover an angle of 60” each Table 2 : where the Front Triplet has a total Coverage Angle of llO”. This means that both the Left Front Pair and the Right Front Pair must cover an angle of 50” each.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: An improved complementary beamforming microphone array with a new noise adaptation is described that improves the signal-to-noise ratio of degraded speech by more than 6 dB and performs more than 18% better in word recognition rates when the interfering noise is two speakers.
Abstract: This paper describes an improved complementary beamforming microphone array with a new noise adaptation. Complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns. In this system, two directivity patterns of the beamformers are adapted to the noise directions so that the expectation values of each noise power spectrum are minimized. Using this technique, we can realize the directional nulls for each noise even when the number of sound sources exceeds that of microphones. To evaluate the effectiveness, speech enhancement experiments are performed based on computer simulations with a two-element array and three sound sources. Compared with the conventional spectral subtraction method cascaded with the adaptive beamformer, it is shown that the proposed array improves the signal-to-noise ratio of degraded speech by more than 6 dB and performs more than 18% better in word recognition rates when the interfering noise is two speakers.

Journal ArticleDOI
TL;DR: A new method for speech detection using a microphone array using the signal-to-noise ratio (SNR) of each signal segment and outperforms the conventional energy detection method.
Abstract: A new method for speech detection using a microphone array is proposed. An explicit expression is first deduced for representing the signal-to-noise ratio (SNR) of each signal segment. A constant SNR threshold is then used to discriminate between speech and nonspeech signals. Simulation results show that the proposed method outperforms the conventional energy detection method.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: This contribution has addressed the problem of speech enhancement in noisy and reverberant rooms through the use of a new approach that combines the dereverberation abilities of a structure based in the separate processing of the minimum-phase and all-pass components of the input speech signals, and the noise rejection performance of a speech-activity-based Wiener filter able to cope both with coherent and diffuse noise.
Abstract: In this contribution we have addressed the problem of speech enhancement in noisy and reverberant rooms through the use of a new approach that combines the dereverberation abilities of a structure based in the separate processing of the minimum-phase and all-pass components of the input speech signals, and the noise rejection performance of a speech-activity-based Wiener filter able to cope both with coherent and diffuse noise. Experiments have been performed with the CMU real multichannel database, which includes a clean speech reference through a head-mounted microphone. This reference signal have been also used to perform simulation experiments in controlled conditions. Extensive results have been obtained, both with log area ratio and cepstral distances of input and processed signals to the reference, and with segSNR improvements, assessing the abilities of the new system to cope both with reverberation and coherent and diffuse noise in different acoustic environments.

Proceedings ArticleDOI
24 Apr 2000
TL;DR: An experimental mobile robot with acoustic source localization capabilities for surveillance and transportation tasks in indoor environments using a distributed architecture with TCP/IP message passing and hardware and software architectures are described.
Abstract: This paper describes an experimental mobile robot with acoustic source localization capabilities for surveillance and transportation tasks in indoor environments. The location of a speaking operator is detected via a microphone array based algorithm; localization information are passed to a navigation module which sets up a navigation mission using knowledge of the environment map. The system has been developed using a distributed architecture with TCP/IP message passing. We describe the hardware and software architectures, as well as the algorithms. Experimental results describing the system performance in localization tasks are reported.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: Connected strings of seven digits from the TIDIGITS database were recorded in a reverberant office room for evaluation using microphone array processing and HMM, hidden Markov model, adaptation to optimize MLLR results.
Abstract: Connected strings of seven digits from the TIDIGITS database were recorded in a reverberant office room for evaluation using microphone array processing and HMM, hidden Markov model, adaptation. A sixteen-channel linear microphone array records a distance speech database useful for further experimentation. The adaptation techniques of parallel model combination (PMC) and maximum likelihood linear regression (MLLR) are evaluated and compared. The effect of the number of adaptation utterances and number of vectors per class for the regression tree in order to optimize MLLR results are studied. Results show, compared to no adaptation, 40% word error reduction (improvement to 4.2%) for PMC and 60% word error reduction (improvement to 3.0%) for MLLR.

Proceedings ArticleDOI
05 Jun 2000
TL;DR: Experimental results of connected digit recognition show that models trained with filtered clean speech allows to obtain better recognition performance than modelstrained with clean speech, and results show a significant performance increase when incremental adaptation is applied.
Abstract: A challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with incremental model adaptation functionalities. The use of a single far microphone as well as that of a microphone array input are investigated. In a previous work it was shown that the acoustic mismatch, remaining after the application of microphone array processing, can be further reduced by conditioning hidden Markov models to operating acoustic conditions. Conditioned HMMs are models trained using the filtered version of a clean corpus, which is speech material better representing noisy real environments. Afterwards, conditioned models are used as initial models for unsupervised incremental adaptation. Experimental results of connected digit recognition show that models trained with filtered clean speech allows to obtain better recognition performance than models trained with clean speech. Furthermore, results show a significant performance increase when incremental adaptation is applied, even after recognition of few utterances.

Journal ArticleDOI
TL;DR: A new robust adaptive microphone array suitable for a two-channel audio input system that tracks a target speaker by using two adaptive beamformers and can reduce the target signal distortion.
Abstract: This paper proposes a new robust adaptive microphone array suitable for a two-channel audio input system. This microphone array tracks a target speaker by using two adaptive beamformers and can reduce the target signal distortion, which is caused when the target signal is canceled in the case where the target direction differs from the initial look-direction of the microphone array. To enable target tracking, the directional responses of the beamformers are exploited to estimate the directions of arrival (DOA) of the target and the interference. In addition to the target tracking, a new arrangement of directional microphones is proposed to reduce the degradation of interference suppression caused by spatial aliasing. Simulation results show that the proposed microphone array can extract the signal from a moving target with high accuracy. © 2000 Scripta Technica, Electron Comm Jpn Pt 3, 83(12): 19–24, 2000

01 Oct 2000
TL;DR: WESTPRAC VII 2000: the 7th West Pacific Regional Acoustics Conference, October 3-5, 2000, Kumamoto, Japan.
Abstract: WESTPRAC VII 2000: the 7th West Pacific Regional Acoustics Conference, October 3-5, 2000, Kumamoto, Japan.

Book ChapterDOI
01 Mar 2000
TL;DR: This chapter addresses the limitations of current approaches to using microphone arrays for speech acquisition and advocate the development of multichannel techniques which employ non-traditional processing and an explicit model of the speech signal and offers a multi-channel algorithm which incorporates these principles.
Abstract: In this chapter we address the limitations of current approaches to using microphone arrays for speech acquisition and advocate the development of multichannel techniques which employ non-traditional processing and an explicit model of the speech signal. The goal is to combine the advantages of spatial filtering achieved through beamforming with knowledge of the desired time-series attributes and intuitive nonlinear processing. We then offer a multi-channel algorithm which incorporates these principles. The enhanced speech is synthesized using a linear predictive filter. The excitation signal is computed from a nonlinear wavelet-domain process. It uses extrema clustering of the multi-channel speech data to discriminate portions of the linear prediction residual produced by the desired speech signal from those due to multi path effects and uncorrelated noise. The algorithm is shown to be capable of identifying and attenuating reverberant portions of the speech signal and reducing the effects of additive noise.

Patent
12 May 2000
TL;DR: In this article, the authors proposed a method to estimate the sound receiving signal at an optional position in a 3D space by using three-dimensional arranged microphones, where at least three microphones are disposed in one direction and three lines of microphones are arranged without crossing on a plane.
Abstract: PROBLEM TO BE SOLVED: To provide a microphone array device by which a sound receiving signal at an optional position in a three-dimensional space is estimated by three-dimensionally arranged microphones SOLUTION: At least three microphones 11 are arranged in respective spatial base or at least three lines of microphone examples where at least the three microphones 11 are disposed in one direction are arranged without crossing on a plane The plane is regarded as unit, at least the three layers of the plane are three-dimensionally arranged without crossing and they are arrayed to obtain the boundary condition of sound estimation on respective surfaces which constitute three-dimension A sound receiving signal processing part 12 estimates the sound signal at the optional position based on the timewise change of the sound pressure of the sound receiving signal in the arranged microphones 11 in respective spatial base directions and the spatial change of the sound receiving signal among the microphones 11 through the use of relation between the inclination of sound pressure on the time base and the inclination on the spatial base of air particle speed and relation between the inclination on the spatial base of sound pressure and the inclination on the time base of air particle speed

Proceedings Article
01 Sep 2000
TL;DR: Simulation results and measurements show that speech pause detection improves the overall system performance considerably and enhances the robustness of the speaker localization system by avoiding erroneous position estimates when no speech signal is present.
Abstract: This paper presents a speaker localization system using a microphone array. The array is operated as a steered filter-and-sum beamformer implemented as a summed correlator. In particular, we emphasize the use of a speech pause detector to improve the robustness of the speaker localization system by avoiding erroneous position estimates when no speech signal is present. Simulation results and measurements show that speech pause detection improves the overall system performance considerably.

Proceedings ArticleDOI
30 Jul 2000
TL;DR: A large, real-time, working system that applies an array of microphones and sophisticated signal processing to obtain close-talking pick up from participants by passing microphones among talkers, or requiring individuals to approach a microphone station.
Abstract: Poor sound pick up by remote microphones in multimedia applications, conference rooms and auditoria has traditionally hampered recording and communicating among spatially-separated groups. The culprits are reverberation (multipath distortion) and interfering acoustic noise. The typical solution is to obtain close-talking pick up from participants by passing microphones among talkers, or requiring individuals to approach a microphone station. Both are unsatisfactory, time-consuming, and inconvenient. The challenge is to obtain high-quality sound pick up from microphones far from the talker that do not encumber the user by hand-held, body-worn or tethered equipment. One solution is to apply an array of microphones and sophisticated signal processing. A brief description of a large, real-time, working system is presented and early results from using this system are given. Results include measured and theoretical signal-to-noise performance, beampatterns, and the dispersion of location estimates.