scispace - formally typeset
Search or ask a question
Author

Nobuyuki Miyake

Bio: Nobuyuki Miyake is an academic researcher from Kobe University. The author has contributed to research in topics: Noise & Noise reduction. The author has an hindex of 4, co-authored 10 publications receiving 49 citations.

Papers
More filters
Proceedings ArticleDOI
24 Apr 2008
TL;DR: A new mobile robot with hands-free speech recognition that can understand whether user's utterances are commands for the robot or not, where commands are discriminated from human- human conversations by acoustic features.
Abstract: For a mobile robot to serve people in actual environments, such as a living room or a party room, it must be easy to control because some users might not even be capable of operating a computer keyboard. For nonexpert users, speech recognition is one of the most effective communication tools when it comes to a hands-free (human-robot) interface. This paper describes a new mobile robot with hands-free speech recognition. For a hands- free speech interface, it is important to detect commands for a robot in spontaneous utterances. Our system can understand whether user's utterances are commands for the robot or not, where commands are discriminated from human- human conversations by acoustic features. Then the robot can move according to the user's voice (command). In order to capture the user's voice only, a robust voice detection system with AdaBoost is also described.

13 citations

Journal ArticleDOI
TL;DR: In this article, a method for reducing sudden noise using noise detection and classification methods, and noise power estimation, was described, which achieved good performance for recognition of utterances overlapped by sudden noises.
Abstract: This paper describes a method for reducing sudden noise using noise detection and classification methods, and noise power estimation. Sudden noise detection and classification have been dealt with in our previous study. In this paper, GMM-based noise reduction is performed using the detection and classification results. As a result of classification, we can determine the kind of noise we are dealing with, but the power is unknown. In this paper, this problem is solved by combining an estimation of noise power with the noise reduction method. In our experiments, the proposed method achieved good performance for recognition of utterances overlapped by sudden noises.

12 citations

Patent
13 Dec 2006
TL;DR: In this paper, a final discriminator for discriminating binary values indicating whether data of a noise-superposed speech having a noise superposed in a speech section is a noise generated by a predetermined sound source was proposed.
Abstract: PROBLEM TO BE SOLVED: To discriminate the kind (sound source) of a noise. SOLUTION: A final discriminator for discriminating binary values indicating whether data of a noise-superposed speech having a noise superposed in a speech section is a noise generated by a predetermined sound source is held for each predetermined sound source, the input data of the noise-superposed speech is discriminated by using held final discriminators by predetermined sound sources, and a final discriminator having the highest score of one of the binary values is decided according to results of the discrimination to detect the sound source of the noise in the data being the predetermined sound source that the decided final discriminator indicates. Further, a plurality of data including the data of the noise-superposed speech are held as data for learning, and boosting is used to derive final discriminators by the predetermined sound sources using the held data for learning. COPYRIGHT: (C)2008,JPO&INPIT

8 citations

Proceedings Article
01 Jan 2008
TL;DR: This paper describes a method for reducing sudden noise using noise detection and classification methods, and noise power estimation, and achieves good performance for recognition of utterances overlapped by sudden noises.
Abstract: This paper describes a method for reducing sudden noise using noise detection and classification methods, and noise power estimation. Sudden noise detection and classification have been dealt with in our previous study. In this paper, GMM-based noise reduction is performed using the detection and classification results. As a result of classification, we can determine the kind of noise we are dealing with, but the power is unknown. In this paper, this problem is solved by combining an estimation of noise power with the noise reduction method. In our experiments, the proposed method achieved good performance for recognition of utterances overlapped by sudden noises.

6 citations

Proceedings ArticleDOI
26 Aug 2007
TL;DR: A novel method to detect and classify sudden noises in speech signals using Boosting, which can create a complex, non-linear boundary that determines whether the observed signal is speech, noise1, noise2, or so on.
Abstract: This paper presents a novel method to detect and classify sudden noises in speech signals. There are many sudden and short-period noises in natural environments, such as inside a car. If a speech recognition system can detect sudden noises, it will make it possible for the system to ask the speaker to repeat the same utterance so that the speech data will be clean. If clean speech data can be input, it will help prevent system operation errors. In this paper, we tried to detect and classify sudden noises in user's utterances using Boosting. Boosting can create a complex, non-linear boundary that determines whether the observed signal is speech, noise1, noise2, or so on. In our experiments, the proposed method achieved good performance in comparison to a conventional method based on the GMM (Gaussian Mixture Model).

4 citations


Cited by
More filters
DissertationDOI
01 Jan 2014
TL;DR: The approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions, which enables novel methods for SER to be developed based on spectrogramimage processing, which are inspired by techniques from the field of image processing.
Abstract: The objective of this research is to develop feature extraction and classification techniques for the task of sound event recognition (SER) in unstructured environments. Although this field is traditionally overshadowed by the popular field of automatic speech recognition (ASR), an SER system that can achieve human-like sound recognition performance opens up a range of novel application areas. These include acoustic surveillance, bio-acoustical monitoring, environmental context detection, healthcare applications and more generally the rich transcription of acoustic environments. The challenge in such environments are the adverse effects such as noise, distortion and multiple sources, which are more likely to occur with distant microphones compared to the close-talking microphones that are more common in ASR. In addition, the characteristics of acoustic events are less well defined than those of speech, and there is no sub-word dictionary available like the phonemes in speech. Therefore, the performance of ASR systems typically degrades dramatically in these challenging unstructured environments, and it is important to develop new methods that can perform well for this challenging task. In this thesis, the approach taken is to interpret the sound event as a two-dimensional spectrogram image, with the two axes as the time and frequency dimensions. This enables novel methods for SER to be developed based on spectrogram image processing, which are inspired by techniques from the field of image processing. The motivation for such an approach is based on finding an automatic approach to “spectrogram reading”, where it is possible for humans to visually recognise the different sound event signatures in the spectrogram. The advantages of such an approach are twofold. Firstly, the sound event image representation makes it possible to naturally capture the sound information in a two-dimensional feature. This has advantages over conventional onedimensional frame-based features, which capture only a slice of spectral information

62 citations

Patent
Akihiko Sugiyama1
05 Mar 2008
TL;DR: In this article, a shock noise detection and sound suppression system was proposed to suppress the shock sound in a noise suppression device, which consists of two components: a noise detection unit which detects an input signal including a shock signal and detects a noise according to a change of the input signal; and a sound suppression unit which receives the noise detection result and the noise suppression result and suppress the noise.
Abstract: The noise suppression device includes: a shock noise detection unit which receives an input signal including a shock noise and detects a shock noise according to a change of the input signal; and a shock sound suppression unit which receives the shock sound detection result and the input signal so as to suppress the shock sound.

42 citations

Patent
16 Apr 2010
TL;DR: In this paper, a speech detection apparatus and a method were proposed to determine whether a frame is speech or not using feature information extracted from an input signal. But, they did not specify which feature information is required for speech detection for each frame in the estimated situation.
Abstract: A speech detection apparatus and method are provided. The speech detection apparatus and method determine whether a frame is speech or not using feature information extracted from an input signal. The speech detection apparatus may estimate a situation related to an input frame and determine which feature information is required for speech detection for the input frame in the estimated situation. The speech detection apparatus may detect a speech signal using dynamic feature information that may be more suitable to the situation of a particular frame, instead of using the same feature information for each and every frame.

15 citations

Proceedings ArticleDOI
14 Mar 2010
TL;DR: A novel method to detect robot-directed (RD) speech that adopts the Multimodal Semantic Confidence (MSC) measure, calculated by integrating speech, image, and motion confidence measures with weightings that are optimized by logistic regression.
Abstract: In this paper, we propose a novel method to detect robot-directed (RD) speech that adopts the Multimodal Semantic Confidence (MSC) measure. The MSC measure is used to decide whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, image, and motion confidence measures with weightings that are optimized by logistic regression. Experimental results show that, compared with a baseline method that uses speech confidence only, MSC achieved an absolute increase of 5% for clean speech and 12% for noisy speech in terms of average maximum F-measure.

12 citations

Journal ArticleDOI
TL;DR: In this article, a method for reducing sudden noise using noise detection and classification methods, and noise power estimation, was described, which achieved good performance for recognition of utterances overlapped by sudden noises.
Abstract: This paper describes a method for reducing sudden noise using noise detection and classification methods, and noise power estimation. Sudden noise detection and classification have been dealt with in our previous study. In this paper, GMM-based noise reduction is performed using the detection and classification results. As a result of classification, we can determine the kind of noise we are dealing with, but the power is unknown. In this paper, this problem is solved by combining an estimation of noise power with the noise reduction method. In our experiments, the proposed method achieved good performance for recognition of utterances overlapped by sudden noises.

12 citations