scispace - formally typeset
Search or ask a question
Author

Keisuke Nakamura

Bio: Keisuke Nakamura is an academic researcher from Nara Institute of Science and Technology. The author has contributed to research in topics: Noise & Interface (Java). The author has an hindex of 2, co-authored 2 publications receiving 64 citations.

Papers
More filters
Proceedings ArticleDOI
04 Oct 2004
TL;DR: ICSLP2004: the 8th International Conference on Spoken Language Processing, October 4-8, 2004, Jeju Island, Korea.
Abstract: ICSLP2004: the 8th International Conference on Spoken Language Processing, October 4-8, 2004, Jeju Island, Korea.

61 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: It is theoretically and experimentally pointed out that ICA is proficient in noise estimation under a non-point-source noise condition rather than in speech estimation, and a new blind spatial subtraction array (BSSA) is proposed that utilizes ICA as a noise estimator.
Abstract: We propose a new blind spatial subtraction array (BSSA) consisting of a noise estimator based on independent component analysis (ICA) for efficient speech enhancement. In this paper, first, we theoretically and experimentally point out that ICA is proficient in noise estimation under a non-point-source noise condition rather than in speech estimation. Therefore, we propose BSSA that utilizes ICA as a noise estimator. In BSSA, speech extraction is achieved by subtracting the power spectrum of noise signals estimated using ICA from the power spectrum of the partly enhanced target speech signal with a delay-and-sum beamformer. This ldquopower-spectrum-domain subtractionrdquo procedure enables better noise reduction than the conventional ICA with estimation-error robustness. Another benefit of BSSA architecture is ldquopermutation robustness". Although the ICA part in BSSA suffers from a source permutation problem, the BSSA architecture can reduce the negative affection when permutation arises. The results of various speech enhancement test reveal that the noise reduction and speech recognition performance of the proposed BSSA are superior to those of conventional methods.

118 citations

Proceedings Article
01 Aug 2009
TL;DR: A nearly ideal VAD algorithm is proposed which is both easy-to-implement and noise robust, comparing to some previous methods and uses short-term features such as Spectral Flatness and Short-term Energy.
Abstract: Voice Activity Detection (VAD) is a very important front end processing in all Speech and Audio processing applications. The performance of most if not all speech/audio processing methods is crucially dependent on the performance of Voice Activity Detection. An ideal voice activity detector needs to be independent from application area and noise condition and have the least parameter tuning in real applications. In this paper a nearly ideal VAD algorithm is proposed which is both easy-to-implement and noise robust, comparing to some previous methods. The proposed method uses short-term features such as Spectral Flatness (SF) and Short-term Energy. This helps the method to be appropriate for online processing tasks. The proposed method was evaluated on several speech corpora with additive noise and is compared with some of the most recent proposed algorithms. The experiments show satisfactory performance in various noise conditions.

87 citations

Proceedings ArticleDOI
23 Nov 2005
TL;DR: A method to detect smile expression and laughter sound robustly by combining an image-based facial expression recognition method and an audio-based laughter sound recognition method, which could detect smile faces by more than 80% recall and precision rate.
Abstract: This paper describes a method to detect smiles and laughter sounds from the video of natural dialogue. A smile is the most common facial expression observed in a dialogue. Detecting a user's smiles and laughter sounds can be useful for estimating the mental state of the user of a spoken-dialogue-based user interface. In addition, detecting laughter sound can be utilized to prevent the speech recognizer from wrongly recognizing the laughter sound as meaningful words. In this paper, a method to detect smile expression and laughter sound robustly by combining an image-based facial expression recognition method and an audio-based laughter sound recognition method. The image-based method uses a feature vector based on feature point detection from face images. The method could detect smile faces by more than 80% recall and precision rate. A method to combine a GMM-based laughter sound recognizer and the image-based method could improve the accuracy of detection of laughter sounds compared with methods that use image or sound only. As a result, more than 70% recall and precision rate of laughter sound detection was obtained from the natural conversation videos

75 citations

Proceedings ArticleDOI
05 Dec 2011
TL;DR: Assessment of robust tracking of humans based on intelligent Sound Source Localization for a robot in a real environment shows GEVD-MUSIC improved the noise-robustness of SSL by a signal-to-noise ratio of 5–6 dB, and audio-visual integration improved the average tracking error by approximately 50%.
Abstract: We have assessed robust tracking of humans based on intelligent Sound Source Localization (SSL) for a robot in a real environment. SSL is fundamental for robot audition, but has three issues in a real environment: robustness against noise with high power, lack of a general framework for selective listening to sound sources, and tracking of inactive and/or noisy sound sources. To address the first issue, we extended Multiple SIgnal Classification by incorporating Generalized EigenValue Decomposition (GEVD-MUSIC) so that it can deal with high power noise and can select target sound sources. To address the second issue, we proposed Sound Source Identification (SSI) based on hierarchical gaussian mixture models and integrated it with GEVD-MUSIC to realize a selective listening function. To address the third issue, we integrated audio-visual human tracking using particle filtering. Integration of these three techniques into an intelligent human tracking system showed: 1) GEVD-MUSIC improved the noise-robustness of SSL by a signal-to-noise ratio of 5–6 dB; 2) SSI performed more than 70% in F-measure even in a noisy environment; and 3) audio-visual integration improved the average tracking error by approximately 50%.

64 citations

Journal ArticleDOI
TL;DR: PARADE is applied to a front-end processing technique for automatic speech recognition (ASR) that employs a robust feature extraction method called SPADE (Subband based Periodicity and Aperiodicity DEcomposition) as an application of PARADE, and confirmed that PARADE can improve the performance of front- end processing for ASR.

56 citations