scispace - formally typeset
Search or ask a question
Author

Masato Miyoshi

Bio: Masato Miyoshi is an academic researcher from Nippon Telegraph and Telephone. The author has contributed to research in topics: Reverberation & Speech processing. The author has an hindex of 22, co-authored 114 publications receiving 2785 citations. Previous affiliations of Masato Miyoshi include Hokkaido University & Kanazawa University.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a novel method is proposed for realizing exact inverse filtering of acoustic impulse responses in room, based on the principle called the multiple-input/output inverse theorem (MINT).
Abstract: A novel method is proposed for realizing exact inverse filtering of acoustic impulse responses in room. This method is based on the principle called the multiple-input/output inverse theorem (MINT). The inverse is constructed from multiple finite-impulse response (FIR) filters (transversal filters) by adding some extra acoustic signal-transmission channels produced by multiple loudspeakers or microphones. The coefficients of these FIR filters can be computed by the well-known rules of matrix algebra. Inverse filtering in a sound field is investigated experimentally. It is shown that the proposed method is greatly superior to previous methods that use only one acoustic signal-transmission channel. The results prove the possibility of sound reproduction and sound reception without any distortion caused by reflected sounds. >

734 citations

Journal ArticleDOI
TL;DR: NDLP can robustly estimate an inverse system for late reverberation in the presence of noise without greatly distorting a direct speech signal and can be implemented in a computationally efficient manner in the time-frequency domain.
Abstract: This paper proposes a statistical model-based speech dereverberation approach that can cancel the late reverberation of a reverberant speech signal captured by distant microphones without prior knowledge of the room impulse responses. With this approach, the generative model of the captured signal is composed of a source process, which is assumed to be a Gaussian process with a time-varying variance, and an observation process modeled by a delayed linear prediction (DLP). The optimization objective for the dereverberation problem is derived to be the sum of the squared prediction errors normalized by the source variances; hence, this approach is referred to as variance-normalized delayed linear prediction (NDLP). Inheriting the characteristic of DLP, NDLP can robustly estimate an inverse system for late reverberation in the presence of noise without greatly distorting a direct speech signal. In addition, owing to the use of variance normalization, NDLP allows us to improve the dereverberation result especially with relatively short (of the order of a few seconds) observations. Furthermore, NDLP can be implemented in a computationally efficient manner in the time-frequency domain. Experimental results demonstrate the effectiveness and efficiency of the proposed approach in comparison with two existing approaches.

371 citations

Journal ArticleDOI
TL;DR: A room impulse response is assumed to consist of three parts: a direct-path response, early reflections and late reverberations, which is known to be a major cause of ASR performance degradation.
Abstract: A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speech recognition (ASR) performance. One way to solve this problem is to dereverberate the observed signal prior to ASR. In this paper, a room impulse response is assumed to consist of three parts: a direct-path response, early reflections and late reverberations. Since late reverberations are known to be a major cause of ASR performance degradation, this paper focuses on dealing with the effect of late reverberations. The proposed method first estimates the late reverberations using long-term multi-step linear prediction, and then reduces the late reverberation effect by employing spectral subtraction. The algorithm provided good dereverberation with training data corresponding to the duration of one speech utterance, in our case, less than 6 s. This paper describes the proposed framework for both single-channel and multichannel scenarios. Experimental results showed substantial improvements in ASR performance with real recordings under severe reverberant conditions.

186 citations

Journal ArticleDOI
TL;DR: The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s.
Abstract: This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.

164 citations

Proceedings ArticleDOI
12 May 2008
TL;DR: Methods for implementing MCLP based speech dereverberation that allow it to work in the short time Fourier transform (STFT) domain with much less computing cost are presented.
Abstract: It has recently been shown that the use of the time-varying nature of speech signals allows us to achieve high quality speech dereverberation based on multi-channel linear prediction (MCLP). However, this approach requires a huge computing cost for calculating large covariance matrices in the time domain. In addition, we face the important problem of how to combine the speech dereverberation efficiently with many other useful speech enhancement techniques in the short time Fourier transform (STFT) domain. As the first step to overcoming these problems, this paper presents methods for implementing MCLP based speech dereverberation that allow it to work in the STFT domain with much less computing cost. The effectiveness of the present methods is confirmed by experiments in terms of the recovered signal quality and the computing time.

136 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
01 Oct 1980

1,565 citations

Journal ArticleDOI
TL;DR: The importance of having a clear understanding of the principles behind both the acoustics and the electrical control in order to appreciate the advantages and limitations of active noise control is emphasized.
Abstract: Active noise control exploits the long wavelengths associated with low frequency sound. It works on the principle of destructive interference between the sound fields generated by the original primary sound source and that due to other secondary sources, acoustic outputs of which can be controlled. The acoustic objectives of different active noise control systems and the electrical control methodologies that are used to achieve these objectives are examined. The importance of having a clear understanding of the principles behind both the acoustics and the electrical control in order to appreciate the advantages and limitations of active noise control is emphasized. A brief discussion of the physical basis of active sound control that concentrates on three-dimensional sound fields is presented. >

965 citations

Journal ArticleDOI
TL;DR: In this article, a novel method is proposed for realizing exact inverse filtering of acoustic impulse responses in room, based on the principle called the multiple-input/output inverse theorem (MINT).
Abstract: A novel method is proposed for realizing exact inverse filtering of acoustic impulse responses in room. This method is based on the principle called the multiple-input/output inverse theorem (MINT). The inverse is constructed from multiple finite-impulse response (FIR) filters (transversal filters) by adding some extra acoustic signal-transmission channels produced by multiple loudspeakers or microphones. The coefficients of these FIR filters can be computed by the well-known rules of matrix algebra. Inverse filtering in a sound field is investigated experimentally. It is shown that the proposed method is greatly superior to previous methods that use only one acoustic signal-transmission channel. The results prove the possibility of sound reproduction and sound reception without any distortion caused by reflected sounds. >

734 citations

Journal Article
TL;DR: The theory of recording and reproduction of three-dimensional sound fields based on spherical harmonics is reviewed and extended in this paper, where mode-matching and simple source approaches to sound reproduction in anechoic environments are discussed.
Abstract: The theory of recording and reproduction of three-dimensional sound fields based on spherical harmonics is reviewed and extended. Free-field, sphere, and general recording arrays are reviewed, and the mode-matching and simple source approaches to sound reproduction in anechoic environments are discussed. Both methods avoid the need for both monopole and dipole loudspeakers—as required by the Kirchhoff–Helmholtz integral. An error analysis is presented and simulation examples are given. It is also shown that the theory can be extended to sound reproduction in reverberant environments.

467 citations