Home
/
Authors
/
Kazuo Hiyane

Author

Kazuo Hiyane

Bio: Kazuo Hiyane is an academic researcher from Mitsubishi Research Institute. The author has contributed to research in topics: Sound (geography) & Microphone array. The author has an hindex of 7, co-authored 14 publications receiving 366 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•

Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition

[...]

Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, T. Nishiura¹, Takeshi Yamada² - Show less +1 more•Institutions (2)

Nara Institute of Science and Technology¹, University of Tsukuba²

01 May 2000

TL;DR: LREC2000: the 2nd International Conference on Language Resources and Evaluation, May 31 - June 2, 2000, Athens, Greece.

...read moreread less

Abstract: LREC2000: the 2nd International Conference on Language Resources and Evaluation, May 31 - June 2, 2000, Athens, Greece.

...read moreread less

259 citations

Journal Article•DOI•

Sound scene data collection in real acoustical environments

[...]

Satoshi Nakamura¹, Kazuo Hiyane², Futoshi Asano, Takashi Endo•Institutions (2)

Nara Institute of Science and Technology¹, Mitsubishi Research Institute²

01 May 1999-The Journal of The Acoustical Society of Japan (e)

41 citations

Proceedings Article•

Data Collection in Real Acoustical Environments for Sound Scene Understanding and Hands-Free Speech Recognition

[...]

Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Takeshi Yamada, Takashi Endo - Show less +1 more

01 Sep 1999

TL;DR: EUROSPEECH1999: the 6th European Conference on Speech Communication and Techinology, September 5-9, 1999, Budapest, Hungary.

...read moreread less

Abstract: EUROSPEECH1999: the 6th European Conference on Speech Communication and Techinology, September 5-9, 1999, Budapest, Hungary.

...read moreread less

30 citations

DOI•

RWCP Sound Scene Database in Real Acoustic Environment

[...]

Kazuo Hiyane¹, Satoshi Nakamura, Jun Iio¹, Futoshi Asano², Yutaka Kaneda³, Takeshi Yamada⁴, Takanobu Nishiura⁵, Tetsunori Kobayashi⁶, Shiro Ise⁷, Hiroshi Sawatari⁸ - Show less +6 more•Institutions (8)

Mitsubishi Research Institute¹, National Institute of Advanced Industrial Science and Technology², Tokyo Denki University³, University of Tsukuba⁴, Wakayama University⁵, Waseda University⁶, Kyoto University⁷, Nara Institute of Science and Technology⁸

01 Jan 2002

21 citations

Proceedings Article•DOI•

Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding

[...]

Satoshi Nakamura, Kazuo Hiyane¹, Futoshi Asano², Yutaka Kaneda³, Takeshi Yamada⁴, Takanobu Nishiura⁵, T. Kobayashi, Shiro Ise⁶, Hiroshi Saruwatari⁷ - Show less +5 more•Institutions (7)

Mitsubishi Research Institute¹, National Institute of Advanced Industrial Science and Technology², Tokyo Denki University³, University of Tsukuba⁴, Wakayama University⁵, Kyoto University⁶, Nara Institute of Science and Technology⁷

07 Nov 2002

TL;DR: Progress of the sound scene database collection project and application to environment sound recognition and hands-free speech recognition are described.

...read moreread less

Abstract: The sound data for open evaluation is necessary for studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustic environments. This paper reports on our project for acoustic data collection. There are many kinds of sound scenes in real environments. The sound scene is specified by sound sources and room acoustics. The number of combinations of the sound sources, source positions and rooms is huge in real acoustic environments. We assumed that the sound in the environments can be simulated by convolution of the isolated sound sources and impulse responses. As an isolated sound source, hundred kinds of environment sounds and speech sounds are collected. The impulse responses are collected in various acoustic environments. Additionally we collected sounds from a moving source. In this paper, progress of our sound scene database collection project and application to environment sound recognition and hands-free speech recognition are described.

...read moreread less

13 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

A study on data augmentation of reverberant speech for robust speech recognition

[...]

Tom Ko¹, Vijayaditya Peddinti², Daniel Povey², Michael L. Seltzer³, Sanjeev Khudanpur² - Show less +1 more•Institutions (3)

Huawei¹, Johns Hopkins University², Microsoft³

05 Mar 2017

TL;DR: It is found that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added, and the trained acoustic models not only perform well in the distant- talking scenario but also provide better results in the close-talking scenario.

...read moreread less

Abstract: The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real RIRs can be difficult to acquire, and also the effect of adding point-source noises. We find that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added. Further we show that the trained acoustic models not only perform well in the distant-talking scenario but also provide better results in the close-talking scenario. We evaluate our approach on several LVCSR tasks which can adequately represent both scenarios.

...read moreread less

781 citations

Journal Article•DOI•

A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation

[...]

Sharon Gannot¹, Emmanuel Vincent², Shmulik Markovich-Golan¹, Alexey Ozerov•Institutions (2)

Bar-Ilan University¹, French Institute for Research in Computer Science and Automation²

01 Apr 2017-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper proposes to analyze a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering.

...read moreread less

Abstract: Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. In addition, they are crucial preprocessing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multichannel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this paper, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: 1 the acoustic impulse response model, 2 the spatial filter design criterion, 3 the parameter estimation algorithm, and 4 optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.

...read moreread less

452 citations

Proceedings Article•DOI•

Stable and fast update rules for independent vector analysis based on auxiliary function technique

[...]

Nobutaka Ono¹•Institutions (1)

National Institute of Informatics¹

18 Nov 2011

TL;DR: Stable and fast update rules for independent vector analysis (IVA) based on auxiliary function technique that yield faster convergence and better results than natural gradient updates is presented.

...read moreread less

Abstract: This paper presents stable and fast update rules for independent vector analysis (IVA) based on auxiliary function technique. The algorithm consists of two alternative updates: 1) weighted covariance matrix updates and 2) demixing matrix updates, which include no tuning parameters such as step size. The monotonic decrease of the objective function at each update is guaranteed. The experimental evaluation shows that the derived update rules yield faster convergence and better results than natural gradient updates.

...read moreread less

308 citations

Journal Article•DOI•

Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization

[...]

Daichi Kitamura¹, Nobutaka Ono², Hiroshi Sawada³, Hirokazu Kameoka³, Hiroshi Saruwatari⁴ - Show less +1 more•Institutions (4)

Graduate University for Advanced Studies¹, National Institute of Informatics², Nippon Telegraph and Telephone³, University of Tokyo⁴

01 Sep 2016-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF) based on conventional multichannel NMF (MNMF), which reveals the relationship between MNMF and IVA.

...read moreread less

Abstract: This paper addresses the determined blind source separation problem and proposes a new effective method unifying independent vector analysis (IVA) and nonnegative matrix factorization (NMF). IVA is a state-of-the-art technique that utilizes the statistical independence between sources in a mixture signal, and an efficient optimization scheme has been proposed for IVA. However, since the source model in IVA is based on a spherical multivariate distribution, IVA cannot utilize specific spectral structures such as the harmonic structures of pitched instrumental sounds. To solve this problem, we introduce NMF decomposition as the source model in IVA to capture the spectral structures. The formulation of the proposed method is derived from conventional multichannel NMF (MNMF), which reveals the relationship between MNMF and IVA. The proposed method can be optimized by the update rules of IVA and single-channel NMF. Experimental results show the efficacy of the proposed method compared with IVA and MNMF in terms of separation accuracy and convergence speed.

...read moreread less

296 citations

Journal Article•DOI•

Robust sound event classification using deep neural networks

[...]

Ian McLoughlin¹, Haomin Zhang¹, Zhipeng Xie¹, Yan Song¹, Wei Xiao² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Huawei²

01 Mar 2015-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A sound event classification framework is outlined that compares auditory image front end features with spectrogram image-based frontEnd features, using support vector machine and deep neural network classifiers, and is shown to compare very well with current state-of-the-art classification techniques.

...read moreread less

Abstract: The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise. This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques.

...read moreread less

239 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

Collapse