scispace - formally typeset
Search or ask a question
Author

F. Itakura

Bio: F. Itakura is an academic researcher from Nagoya University. The author has contributed to research in topics: Speech enhancement & Microphone array. The author has an hindex of 5, co-authored 8 publications receiving 349 citations.

Papers
More filters
Proceedings ArticleDOI
05 Jun 2000
TL;DR: This paper describes a new blind signal separation method using the directivity patterns of a microphone array that improves the SNR of degraded speech by about 16 dB under non-reverberant condition and improves theSNR by 8.7 dB when the reverberation time is 184 ms.
Abstract: This paper describes a new blind signal separation method using the directivity patterns of a microphone array. In this method, to deal with the arriving lags among each microphone, the inverses of the mixing matrices are calculated in the frequency domain so that the separated signals are mutually independent. Since the calculations are carried out in each frequency independently, the following problems arise: (1) permutation of each sound source, (2) arbitrariness of each source gain. In this paper, we propose a new solution that directivity patterns are explicitly used to estimate each sound source direction. As the results of signal separation experiments, it is shown that the proposed method improves the SNR of degraded speech by about 16 dB under non-reverberant condition. Also, the proposed method improves the SNR by 8.7 dB when the reverberation time is 184 ms, and by 5.1 dB when the reverberation time is 322 ms.

212 citations

Proceedings ArticleDOI
17 Oct 1999
TL;DR: In this paper, a simple linear interpolation method and spline interpolation methods are evaluated and advantages of both methods clarified, and the results indicate that HRTFs in the median plane can be interpolated by the methods.
Abstract: This paper describes the interpolation of head related transfer functions (HRTFs) for all direction in the median plane. The interpolation of HRTFs enables us to reduce the number of measurements for new user's HRTFs, and also reduce the data of HRTFs in auditory virtual systems. In this paper, a simple linear interpolation method and the spline interpolation method are evaluated and advantages of both methods clarified. In experiments, the interpolation methods are applied to HRTFs measured using a dummy head. The experimental results show that the two methods are comparable in the best case. The resultant minimum spectral distortion is about 2 dB for both methods. The results clarify that the linear interpolation is effective for a set of elevations selected based on the cross correlation and that the spline interpolation is effective at large and equal intervals. These results indicate that HRTFs in the median plane can be interpolated by the methods.

46 citations

Proceedings ArticleDOI
05 Jun 2000
TL;DR: The experimental results show that the proposed space diversity speech recognition system can attain about 80% in accuracy, while the performances of conventional HMMs using close-talking microphones are less than 50%, indicating that the space diversity approach is promising for robust speech recognition under a real acoustic environment.
Abstract: This paper proposes space diversity speech recognition technique using distributed multi-microphones in a room, as a new paradigm of speech recognition. The key technology to realize the system is (1) distant-talking speech recognition and (2) the integration method of multiple inputs. In this paper, we propose the use of a distant speech model for distant-talking speech recognition, and feature-based and likelihood-based integration methods for multimicrophones distributed in the room. The distant speech model is a set of HMMs learned using speech data convolved with the impulse responses measured at several points in the room. The experimental results of simulated distant-talking speech recognition show that the proposed space diversity speech recognition system can attain about 80% in accuracy, while the performances of conventional HMMs using close-talking microphones are less than 50%. These results indicate that the space diversity approach is promising for robust speech recognition under a real acoustic environment.

36 citations

Journal Article
TL;DR: In this article, a spatial spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition is described, which is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other.
Abstract: This paper describes a spatial spectral subtraction method by using the complementary beamforming microphone array to enhance noisy speech signals for speech recognition. The complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this paper, it is shown that the nonlinear subtraction processing with complementary beamforming can result in a kind of the spectral subtraction without the need for speech pause detection. In addition, the optimization algorithm for the directivity pattern is also described. To evaluate the effectiveness, speech enhancement experiments and speech recognition experiments are performed based on computer simulations under both stationary and nonstationary noise conditions. In comparison with the optimized conventional delayand-sum (DS) array, it is shown that: (1) the proposed array improves the signal-to-noise ratio (SNR) of degraded speech by about 2 dB and performs more than 20% better in word recognition rates under the conditions that the white Gaussian noise with the input SNR of −5 or −10 dB is used, (2) the proposed array performs more than 5% better in word recognition rates under the nonstationary noise conditions. Also, it is shown that these improvements of the proposed array are same as or superior to those of the conventional spectral subtraction method cascaded with the DS array. key words: speech enhancement, microphone array, complementary beamforming, spectral subtraction, speech recognition

31 citations

Proceedings ArticleDOI
05 Jun 2000
TL;DR: An improved complementary beamforming microphone array with a new noise adaptation is described that improves the signal-to-noise ratio of degraded speech by more than 6 dB and performs more than 18% better in word recognition rates when the interfering noise is two speakers.
Abstract: This paper describes an improved complementary beamforming microphone array with a new noise adaptation. Complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns. In this system, two directivity patterns of the beamformers are adapted to the noise directions so that the expectation values of each noise power spectrum are minimized. Using this technique, we can realize the directional nulls for each noise even when the number of sound sources exceeds that of microphones. To evaluate the effectiveness, speech enhancement experiments are performed based on computer simulations with a two-element array and three sound sources. Compared with the conventional spectral subtraction method cascaded with the adaptive beamformer, it is shown that the proposed array improves the signal-to-noise ratio of degraded speech by more than 6 dB and performs more than 18% better in word recognition rates when the interfering noise is two speakers.

22 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: By utilizing the harmonics of signals, the new method is robust even for low frequencies where DOA estimation is inaccurate, and provides an almost perfect solution to the permutation problem for a case where two sources were mixed in a room whose reverberation time was 300 ms.
Abstract: Blind source separation (BSS) for convolutive mixtures can be solved efficiently in the frequency domain, where independent component analysis (ICA) is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem: the permutation ambiguity of ICA in each frequency bin should be aligned so that a separated signal in the time-domain contains frequency components of the same source signal. This paper presents a robust and precise method for solving the permutation problem. It is based on two approaches: direction of arrival (DOA) estimation for sources and the interfrequency correlation of signal envelopes. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit their respective advantages. Furthermore, by utilizing the harmonics of signals, we make the new method robust even for low frequencies where DOA estimation is inaccurate. We also present a new closed-form formula for estimating DOAs from a separation matrix obtained by ICA. Experimental results show that our method provided an almost perfect solution to the permutation problem for a case where two sources were mixed in a room whose reverberation time was 300 ms.

644 citations

Journal ArticleDOI
TL;DR: A new algorithm is proposed that exploits higher order frequency dependencies of source signals in order to separate them when they are mixed and outperforms the others in most cases.
Abstract: Blind source separation (BSS) is a challenging problem in real-world environments where sources are time delayed and convolved. The problem becomes more difficult in very reverberant conditions, with an increasing number of sources, and geometric configurations of the sources such that finding directionality is not sufficient for source separation. In this paper, we propose a new algorithm that exploits higher order frequency dependencies of source signals in order to separate them when they are mixed. In the frequency domain, this formulation assumes that dependencies exist between frequency bins instead of defining independence for each frequency bin. In this manner, we can avoid the well-known frequency permutation problem. To derive the learning algorithm, we define a cost function, which is an extension of mutual information between multivariate random variables. By introducing a source prior that models the inherent frequency dependencies, we obtain a simple form of a multivariate score function. In experiments, we generate simulated data with various kinds of sources in various environments. We evaluate the performances and compare it with other well-known algorithms. The results show the proposed algorithm outperforms the others in most cases. The algorithm is also able to accurately recover six sources with six microphones. In this case, we can obtain about 16-dB signal-to-interference ratio (SIR) improvement. Similar performance is observed in real conference room recordings with three human speakers reading sentences and one loudspeaker playing music

426 citations

Journal ArticleDOI
TL;DR: It is shown that there is an optimum frame size that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation, and that it is not good to be constrained by the condition T>P.
Abstract: Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T>P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.

360 citations

Journal ArticleDOI
TL;DR: This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF) based on conventional multichannel NMF (MNMF), which reveals the relationship between MNMF and IVA.
Abstract: This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF). IVA is a state-of-the-art technique that utilizes the statistical independence between sources in a mixture signal, and an efficient optimization scheme has been proposed for IVA. However, since the source model in IVA is based on a spherical multivariate distribution, IVA cannot utilize specific spectral structures such as the harmonic structures of pitched instrumental sounds. To solve this problem, we introduce NMF decomposition as the source model in IVA to capture the spectral structures. The formulation of the proposed method is derived from conventional multichannel NMF (MNMF), which reveals the relationship between MNMF and IVA. The proposed method can be optimized by the update rules of IVA and single-channel NMF. Experimental results show the efficacy of the proposed method compared with IVA and MNMF in terms of separation accuracy and convergence speed.

296 citations

Journal ArticleDOI
TL;DR: The signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method, even under reverberant conditions, and the temporal alternation between ICA and beamforming can realize fast- and high-convergence optimization.
Abstract: We propose a new algorithm for blind source separation (BSS), in which independent component analysis (ICA) and beamforming are combined to resolve the slow-convergence problem through optimization in ICA. The proposed method consists of the following three parts: (a) frequency-domain ICA with direction-of-arrival (DOA) estimation, (b) null beamforming based on the estimated DOA, and (c) integration of (a) and (b) based on the algorithm diversity in both iteration and frequency domain. The unmixing matrix obtained by ICA is temporally substituted by the matrix based on null beamforming through iterative optimization, and the temporal alternation between ICA and beamforming can realize fast- and high-convergence optimization. The results of the signal separation experiments reveal that the signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method, even under reverberant conditions.

226 citations