Home
/
Authors
/
Hiroshi G. Okuno

Author

Hiroshi G. Okuno

Other affiliations: Meiji University, KEK, Commonwealth Scientific and Industrial Research Organisation ...read more

Bio: Hiroshi G. Okuno is an academic researcher from Waseda University. The author has contributed to research in topics: Humanoid robot & Acoustic source localization. The author has an hindex of 48, co-authored 573 publications receiving 9897 citations. Previous affiliations of Hiroshi G. Okuno include Meiji University & KEK.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1984
1983
1978
1975
1958
1957
1956

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Audio-visual speech recognition using deep learning

[...]

Kuniaki Noda¹, Yuki Yamaguchi², Kazuhiro Nakadai³, Hiroshi G. Okuno², Tetsuya Ogata¹ - Show less +1 more•Institutions (3)

Waseda University¹, Kyoto University², Honda³

01 Jun 2015-Applied Intelligence

TL;DR: A connectionist-hidden Markov model (HMM) system for noise-robust AVSR is introduced and it is demonstrated that approximately 65 % word recognition rate gain is attained with denoised MFCCs under 10 dB signal-to-noise-ratio (SNR) for the audio signal input.

...read moreread less

Abstract: Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for reliable speech recognition, particularly when the audio is corrupted by noise. However, cautious selection of sensory features is crucial for attaining high recognition performance. In the machine-learning community, deep learning approaches have recently attracted increasing attention because deep neural networks can effectively extract robust latent features that enable various recognition algorithms to demonstrate revolutionary generalization capabilities under diverse application conditions. This study introduces a connectionist-hidden Markov model (HMM) system for noise-robust AVSR. First, a deep denoising autoencoder is utilized for acquiring noise-robust audio features. By preparing the training data for the network with pairs of consecutive multiple steps of deteriorated audio features and the corresponding clean features, the network is trained to output denoised audio features from the corresponding features deteriorated by noise. Second, a convolutional neural network (CNN) is utilized to extract visual features from raw mouth area images. By preparing the training data for the CNN as pairs of raw images and the corresponding phoneme label outputs, the network is trained to predict phoneme labels from the corresponding mouth area input images. Finally, a multi-stream HMM (MSHMM) is applied for integrating the acquired audio and visual HMMs independently trained with the respective features. By comparing the cases when normal and denoised mel-frequency cepstral coefficients (MFCCs) are utilized as audio features to the HMM, our unimodal isolated word recognition results demonstrate that approximately 65 % word recognition rate gain is attained with denoised MFCCs under 10 dB signal-to-noise-ratio (SNR) for the audio signal input. Moreover, our multimodal isolated word recognition results utilizing MSHMM with denoised MFCCs and acquired visual features demonstrate that an additional word recognition rate gain is attained for the SNR conditions below 10 dB.

...read moreread less

493 citations

Proceedings Article•

Active Audition for Humanoid

[...]

Kazuhiro Nakadai, Tino Lourens, Hiroshi G. Okuno, Hiroaki Kitano

30 Jul 2000

TL;DR: The experimental result demonstrates that the active audition by integration of audition, vision, and motor control enables sound source tracking in variety of conditions.

...read moreread less

Abstract: In this paper, we present an active audition system for humanoid robot “SIG the humanoid” . The audition system of the highly intelligent humanoid requires localization of sound sources and identification of meanings of the sound in the auditory scene. The active audition reported in this paper focuses on improved sound source tracking by integrating audition, vision, and motor movements. Given the multiple sound sources in the auditory scene, SIG actively moves its head to improve localization by aligning microphones orthogonal to the sound source and by capturing the possible sound sources by vision. However, such an active head movement inevitably creates motor noise. The system must adaptively cancel motor noise using motor control signals. The experimental result demonstrates that the active audition by integration of audition, vision, and motor control enables sound source tracking in variety of conditions.

...read moreread less

247 citations

Journal Article•DOI•

Design and Implementation of Robot Audition System 'HARK' — Open Source Software for Listening to Three Simultaneous Speakers

[...]

Kazuhiro Nakadai¹, Toru Takahashi², Hiroshi G. Okuno², Hirofumi Nakajima³, Yuji Hasegawa³, Hiroshi Tsujino³ - Show less +2 more•Institutions (3)

Tokyo Institute of Technology¹, Kyoto University², Honda³

01 Apr 2010-Advanced Robotics

TL;DR: The design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech recognition modules of separated speech signals that works on any robot with any microphone configuration are presented.

...read moreread less

Abstract: This paper presents the design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech re...

...read moreread less

209 citations

Journal Article•DOI•

Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization

[...]

Takuya Yoshioka¹, Tomohiro Nakatani¹, Masato Miyoshi², Hiroshi G. Okuno³•Institutions (3)

Nippon Telegraph and Telephone¹, Kanazawa University², Kyoto University³

01 Jan 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s.

...read moreread less

Abstract: This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.

...read moreread less

164 citations

Proceedings Article•

Hybrid collaborative and content-based music recommendation using probabilistic model with latent user preferences

[...]

Kazuyoshi Yoshii¹, Masataka Goto², Kazunori Komatani¹, Tetsuya Ogata¹, Hiroshi G. Okuno¹ - Show less +1 more•Institutions (2)

Kyoto University¹, National Institute of Advanced Industrial Science and Technology²

01 Dec 2006

TL;DR: A hybrid music recommendation method that solves problems of two prominent conventional methods: collaborative filtering and content-based recommendation and can reasonably recommend pieces even if they have no ratings is presented.

...read moreread less

Abstract: This paper presents a hybrid music recommendation method that solves problems of two prominent conventional methods: collaborative filtering and content-based recommendation. The former cannot recommend musical pieces that have no ratings because recommendations are based on actual user ratings. In addition, artist variety in recommended pieces tends to be poor. The latter, which recommends musical pieces that are similar to users’ favorites in terms of music content, has not been fully investigated. This induces unreliability in modeling of user preferences; the content similarity does not completely reflect the preferences. Our method integrates both rating and content data by using a Bayesian network called an aspect model. Unobservable user preferences are directly represented by introducing latent variables, which are statistically estimated. To verify our method, we conducted experiments by using actual audio signals of Japanese songs and the corresponding rating data collected from Amazon. The results showed that our method outperforms the two conventional methods in terms of recommendation accuracy and artist variety and can reasonably recommend pieces even if they have no ratings.

...read moreread less

153 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

Review of Particle Physics

[...]

Claude Amsler¹, Michael Doser², Mario Antonelli, D. M. Asner³ +173 more•Institutions (86)

01 Jul 1996-Physics Letters B

TL;DR: This biennial Review summarizes much of particle physics, using data from previous editions.

...read moreread less

12,798 citations

Journal Article•DOI•

A survey of socially interactive robots

[...]

Terrence Fong¹, Terrence Fong², Illah Nourbakhsh¹, Kerstin Dautenhahn³•Institutions (3)

Carnegie Mellon University¹, École Polytechnique Fédérale de Lausanne², University of Hertfordshire³

31 Mar 2003-Robotics and Autonomous Systems

TL;DR: The context for socially interactive robots is discussed, emphasizing the relationship to other research fields and the different forms of “social robots”, and a taxonomy of design methods and system components used to build socially interactive Robots is presented.

...read moreread less

2,869 citations

Journal Article•DOI•

Chitosan as Antimicrobial Agent: Applications and Mode of Action

[...]

Entsar I. Rabea¹, Mohamed E.-T. Badawy¹, Christian V. Stevens¹, Guy Smagghe¹, Walter Steurbaut¹ - Show less +1 more•Institutions (1)

Ghent University¹

03 Sep 2003-Biomacromolecules

TL;DR: The current review of 129 references describes the biological activity of several chitosan derivatives and the modes of action that have been postulated in the literature.

...read moreread less

2,615 citations

Journal Article•DOI•

A survey of deep neural network architectures and their applications

[...]

Weibo Liu¹, Zidong Wang¹, Xiaohui Liu¹, Nianyin Zeng², Yurong Liu³, Yurong Liu⁴, Fuad E. Alsaadi³ - Show less +3 more•Institutions (4)

Brunel University London¹, Xiamen University², King Abdulaziz University³, Yangzhou University⁴

19 Apr 2017-Neurocomputing

TL;DR: This work was supported in part by the Royal Society of the UK, the National Natural Science Foundation of China, and the Alexander von Humboldt Foundation of Germany.

...read moreread less

2,404 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse