scispace - formally typeset
Search or ask a question

Showing papers by "Futoshi Asano published in 2003"


Journal ArticleDOI
TL;DR: Two array signal processing techniques are combined with independent component analysis (ICA) to enhance the performance of blind separation of acoustic signals in a reflective environment by using the subspace method, which reduces the effect of room reflection when the system is used in a room.
Abstract: Two array signal processing techniques are combined with independent component analysis (ICA) to enhance the performance of blind separation of acoustic signals in a reflective environment. The first technique is the subspace method which reduces the effect of room reflection when the system is used in a room. Room reflection is one of the biggest problems in blind source separation (BSS) in acoustic environments. The second technique is a method of solving permutation. For employing the subspace method, ICA must be used in the frequency domain, and precise permutation is necessary for all frequencies. In this method, a physical property of the mixing matrix, i.e., the coherency in adjacent frequencies, is utilized to solve the permutation. The experiments in a meeting room showed that the subspace method improved the rate of automatic speech recognition from 50% to 68% and that the method of solving permutation achieves performance that closely approaches that of the correct permutation, differing by only 4% in recognition rate.

164 citations


Proceedings ArticleDOI
06 Apr 2003
TL;DR: Support vector machines is applied to classify the eigenvalue distributions which are not clearly separable and the proposed method is then applied to the source separation system and is evaluated via automatic speech recognition.
Abstract: A method of estimating the number of sound sources in a reverberant sound field is proposed in this paper. It is known that the eigenvalue distribution of the spatial correlation matrix calculated from a multiple microphone input reflects information on the number of sources. However, in a reverberant sound field, the feature of the number of sources in the eigenvalue distribution is degraded by the room reverberation. In this paper, support vector machines is applied to classify the eigenvalue distributions which are not clearly separable. The proposed method is then applied to the source separation system and is evaluated via automatic speech recognition.

48 citations


Proceedings ArticleDOI
08 Jul 2003
TL;DR: From the inference results of the Bayesian network, the in- formation on time and location of speech events can be known in a multiple-sound-source condition.
Abstract: In this paper, a method of detecting speech events in a multiple-sound-source condition us- ing sound and vision information is proposed. Detec- tion of speech event is an important issue for automatic speech recognition operated in a real environment. Fur- thermore, as stated an this paper, the performance of sound source separation using adaptive beamforming is greatly improved by knowing when and where the target speech event occurs. For this purpose, sound localiza- tion using a microphone array and human tracking by stereo vision is combined by a Bayesian network. From the inference results of the Bayesian network, the in- formation on time and location of speech events can be known in a multiple-sound-source condition. Results of an off-line experiment an a real environment with TV and music interference are shown.

20 citations


Proceedings Article
01 Jan 2003
TL;DR: For detecting speech events, sound localization using a microphone array and human tracking by stereo vision is combined by a Bayesian network and a maximum likelihood adaptive beamformer is constructed.
Abstract: In this paper, a method of detecting and separating speech events in a multiple-sound-source condition using audio and video information is proposed. For detecting speech events, sound localization using a microphone array and human tracking by stereo vision is combined by a Bayesian network. From the inference results of the Bayesian network, the information on the time and location of speech events can be known in a multiple-soundsource condition. Based on the detected speech event information, a maximum likelihood adaptive beamformer is constructed and the speech signal is separated from the background noise and interferences.

10 citations


Book ChapterDOI
01 Jan 2003
TL;DR: To successfully apply a speech recognition technology to conversation understandings in real offices, a multiple microphone array system and a context and attentional manager are implemented in the robot.
Abstract: In order for a mobile robot to provide information services in real offices, the robot has to maintain the map of the office. Rather than a completely autonomous approach, we chose to interact with office people to learn and update the topological map using spoken dialogue. To successfully apply a speech recognition technology to conversation understandings in real offices, we implemented a multiple microphone array system and a context and attentional manager in the robot. The robot could demonstrate simple map learning, route guidance, and information service about people’s location.

4 citations


Proceedings Article
01 Jan 2003
TL;DR: The proposed method adapts the extended acoustic model to the noises by estimating the population parameters using a Gaussian Mixture Model (GMM) and Gain-Adapted Hidden Markov Model (GA-HMM) decomposition method.
Abstract: In a real environment, it is essential to adapt an acoustic model to variations in background noises in order to realize robust speech recognition. In this paper, we construct an extended acoustic model by combining a mismatch model with a clean acoustic model trained using only clean speech. We assume the mismatch model conforms to a Gaussian distribution with timevarying population parameters. The proposed method adapts the extended acoustic model to the noises by estimating the population parameters using a Gaussian Mixture Model (GMM) and Gain-Adapted Hidden Markov Model (GA-HMM) decomposition method. We performed recognition experiments under noisy conditions using the AURORA2 database in order to confirm the effectiveness of the proposed method.

2 citations