Home
/
Authors
/
Akinobu Lee

Author

Akinobu Lee

Other affiliations: Nara Institute of Science and Technology, Kyoto University

Bio: Akinobu Lee is an academic researcher from Nagoya Institute of Technology. The author has contributed to research in topics: Hidden Markov model & Acoustic model. The author has an hindex of 22, co-authored 91 publications receiving 2500 citations. Previous affiliations of Akinobu Lee include Nara Institute of Science and Technology & Kyoto University.

Papers published on a yearly basis

2020
2019
2017
2015
2014
2013
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998

Papers

PDF

Open Access

More filters

Proceedings Article•

Julius --- An Open Source Real-Time Large Vocabulary Recognition Engine

[...]

Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Sep 2001

TL;DR: EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, 2001, Aalborg, Denmark.

...read moreread less

Abstract: EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, 2001, Aalborg, Denmark.

...read moreread less

592 citations

Proceedings Article•

Recent Development of Open-Source Speech Recognition Engine Julius

[...]

Akinobu Lee¹, Tatsuya Kawahara²•Institutions (2)

Nagoya Institute of Technology¹, Kyoto University²

04 Oct 2009

TL;DR: An overview of Julius, major features and specifications are described, and the developments conducted in the recent years are summarized.

...read moreread less

Abstract: Julius is an open-source large-vocabulary speech recognition software used for both academic research and industrial applications. It executes real-time speech recognition of a 60k-word dictation task on low-spec PCs with small footprint, and even on embedded devices. Julius supports standard language models such as statistical N-gram model and rule-based grammars, as well as Hidden Markov Model (HMM) as an acoustic model. One can build a speech recognition system of his own purpose, or can integrate the speech recognition capability to a variety of applications using Julius. This article describes an overview of Julius, major features and specifications, and summarizes the developments conducted in the recent years.

...read moreread less

325 citations

Journal Article•DOI•

Blind source separation based on a fast-convergence algorithm combining ICA and beamforming

[...]

Hiroshi Saruwatari¹, Toshiya Kawamura¹, Tsuyoki Nishikawa¹, Akinobu Lee¹, Kiyohiro Shikano¹ - Show less +1 more•Institutions (1)

Nara Institute of Science and Technology¹

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method, even under reverberant conditions, and the temporal alternation between ICA and beamforming can realize fast- and high-convergence optimization.

...read moreread less

Abstract: We propose a new algorithm for blind source separation (BSS), in which independent component analysis (ICA) and beamforming are combined to resolve the slow-convergence problem through optimization in ICA. The proposed method consists of the following three parts: (a) frequency-domain ICA with direction-of-arrival (DOA) estimation, (b) null beamforming based on the estimated DOA, and (c) integration of (a) and (b) based on the algorithm diversity in both iteration and frequency domain. The unmixing matrix obtained by ICA is temporally substituted by the matrix based on null beamforming through iterative optimization, and the temporal alternation between ICA and beamforming can realize fast- and high-convergence optimization. The results of the signal separation experiments reveal that the signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method, even under reverberant conditions.

...read moreread less

226 citations

Proceedings Article•

An HMM-based Singing Voice Synthesis System

[...]

Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda - Show less +1 more

01 Jan 2006

TL;DR: A corpus-based singing voice syn-thesis system based on hidden Markov models (HMMs) that employs the HMM-based speech synthesis to synthesize smooth and natural-sounding singing voice.

...read moreread less

Abstract: The present paper describes a corpus-based singing voice syn-thesis system based on hidden Markov models (HMMs). Thissystem employs the HMM-based speech synthesis to synthesizesingingvoice. Musical information such aslyrics, tones, durationsis modeled simultaneously in a uniﬁed framework of the context-dependent HMM. It can mimic the voice quality and singing styleof the original singer. Results of a singing voice synthesis exper-iment show that the proposed system can synthesize smooth andnatural-sounding singing voice. Index Terms : singing voice synthesis, HMM, time-lag model. 1. Introduction In recent years, various applications of speech synthesis systemshave been proposed and investigated. Singing voice synthesis isone of the hot topics in this area [1–5]. However, only a fewcorpus-based singing voice synthesis systems which can be con-structed automatically have been proposed.Currently, there are two main paradigms in the corpus-basedspeech synthesis area: sample-based approach and statistical ap-proach. The sample-based approach such as unit selection [6]can synthesize high-quality speech. However, it requires a hugeamountoftrainingdatatorealizevariousvoicecharacteristics. Onthe other hand, the quality of statistical approach such as HMM-basedspeechsynthesis[7]isbuzzybecauseitisbasedonavocod-ingtechnique. However,itissmoothandstable,anditsvoicechar-acteristics can easily be modiﬁed by transforming HMM parame-ters appropriately. For singing voice synthesis, applying the unitselection seems to be difﬁcult because a huge amount of singingspeech which covers vast combinations of contextual factors thataffect singing voice has to be recorded. On the other hand, theHMM-based system can be constructed using a relatively smallamount of training data. From this point of view, the HMM-basedapproach seems to be more suitable for the singing voice synthe-sizer. In the present paper, we apply the HMM-based synthesisapproach to singing voice synthesis.Although the singing voice synthesis system proposed in thepresent paper is quite similar to the HMM-based text-to-speechsynthesissystem[7],therearetwomaindifferencesbetweenthem.In the HMM-based text-to-speech synthesis system, contextualfactors which may affect reading speech (e.g. phonemes, sylla-bles, words, phrases, etc.) are taken into account. However, con-textual factors which may affect singing voice should be different

...read moreread less

127 citations

Proceedings Article•

Free software toolkit for Japanese large vocabulary continuous speech recognition

[...]

Tatsuya Kawahara¹, Akinobu Lee¹, Tetsunori Kobayashi, Kazuya Takeda², Nobuaki Minematsu³, Shigeki Sagayama³, Katsunobu Itou, Akinori Ito⁴, Mikio Yamamoto⁵, Atsushi Yamada, Takehito Utsuro⁶, Kiyohiro Shikano⁷ - Show less +8 more•Institutions (7)

Kyoto University¹, Nagoya University², University of Tokyo³, Yamagata University⁴, University of Tsukuba⁵, Toyohashi University of Technology⁶, Nara Institute of Science and Technology⁷

01 Oct 2000

TL;DR: ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.

...read moreread less

Abstract: ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.

...read moreread less

105 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

The Kaldi Speech Recognition Toolkit

[...]

Daniel Povey¹, Arnab Ghoshal², Gilles Boulianne, Lukas Burget³, Ondrej Glembek³, Nagendra Kumar Goel, Mirko Hannemann³, Petr Motlicek⁴, Yanmin Qian⁵, Petr Schwarz³, Jan Silovsky, Georg Stemmer⁶, Karel Vesely³ - Show less +9 more•Institutions (6)

Microsoft¹, Saarland University², Brno University of Technology³, Idiap Research Institute⁴, Tsinghua University⁵, University of Erlangen-Nuremberg⁶

01 Jan 2011

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

...read moreread less

Abstract: We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

...read moreread less

5,857 citations

Proceedings Article•DOI•

ESPNet: End-to-end speech processing toolkit

[...]

Shinji Watanabe¹, Takaaki Hori², Shigeki Karita, Tomoki Hayashi³, Jiro Nishitoba, Yuya Unno, Nelson Yalta⁴, Jahn Heymann⁵, Matthew Wiesner¹, Nanxin Chen¹, Adithya Renduchintala¹, Tsubasa Ochiai⁶ - Show less +8 more•Institutions (6)

Johns Hopkins University¹, Mitsubishi Electric², Nagoya University³, Waseda University⁴, University of Paderborn⁵, Doshisha University⁶

30 Mar 2018

TL;DR: In this article, a new open source platform for end-to-end speech processing named ESPnet is introduced, which mainly focuses on automatic speech recognition (ASR), and adopts widely used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine.

...read moreread less

Abstract: This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.

...read moreread less

806 citations

Journal Article•DOI•

Hidden semi-Markov models

[...]

Shun-Zheng Yu¹•Institutions (1)

Sun Yat-sen University¹

01 Feb 2010-Artificial Intelligence

TL;DR: An overview of HSMMs is presented, including modelling, inference, estimation, implementation and applications, which has been applied in thirty scientific and engineering areas, including speech recognition/synthesis, human activity recognition/prediction, handwriting recognition, functional MRI brain mapping, and network anomaly detection.

...read moreread less

734 citations

2004아테네 올림픽대회출전 한국레슬링 대표선수의 무산소성 운동능력 및 등속성 근력

[...]

윤재량 ( Jae Ryang Yoon )

01 Jan 2005

608 citations

Proceedings Article•

Julius --- An Open Source Real-Time Large Vocabulary Recognition Engine

[...]

Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano¹•Institutions (1)

Nara Institute of Science and Technology¹