scispace - formally typeset
Search or ask a question
Author

Akinobu Lee

Bio: Akinobu Lee is an academic researcher from Nagoya Institute of Technology. The author has contributed to research in topics: Hidden Markov model & Acoustic model. The author has an hindex of 22, co-authored 91 publications receiving 2500 citations. Previous affiliations of Akinobu Lee include Nara Institute of Science and Technology & Kyoto University.


Papers
More filters
Proceedings Article
01 Sep 2001
TL;DR: EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, 2001, Aalborg, Denmark.
Abstract: EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, 2001, Aalborg, Denmark.

592 citations

Proceedings Article
04 Oct 2009
TL;DR: An overview of Julius, major features and specifications are described, and the developments conducted in the recent years are summarized.
Abstract: Julius is an open-source large-vocabulary speech recognition software used for both academic research and industrial applications. It executes real-time speech recognition of a 60k-word dictation task on low-spec PCs with small footprint, and even on embedded devices. Julius supports standard language models such as statistical N-gram model and rule-based grammars, as well as Hidden Markov Model (HMM) as an acoustic model. One can build a speech recognition system of his own purpose, or can integrate the speech recognition capability to a variety of applications using Julius. This article describes an overview of Julius, major features and specifications, and summarizes the developments conducted in the recent years.

325 citations

Journal ArticleDOI
TL;DR: The signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method, even under reverberant conditions, and the temporal alternation between ICA and beamforming can realize fast- and high-convergence optimization.
Abstract: We propose a new algorithm for blind source separation (BSS), in which independent component analysis (ICA) and beamforming are combined to resolve the slow-convergence problem through optimization in ICA. The proposed method consists of the following three parts: (a) frequency-domain ICA with direction-of-arrival (DOA) estimation, (b) null beamforming based on the estimated DOA, and (c) integration of (a) and (b) based on the algorithm diversity in both iteration and frequency domain. The unmixing matrix obtained by ICA is temporally substituted by the matrix based on null beamforming through iterative optimization, and the temporal alternation between ICA and beamforming can realize fast- and high-convergence optimization. The results of the signal separation experiments reveal that the signal separation performance of the proposed algorithm is superior to that of the conventional ICA-based BSS method, even under reverberant conditions.

226 citations

Proceedings Article
01 Jan 2006
TL;DR: A corpus-based singing voice syn-thesis system based on hidden Markov models (HMMs) that employs the HMM-based speech synthesis to synthesize smooth and natural-sounding singing voice.
Abstract: The present paper describes a corpus-based singing voice syn-thesis system based on hidden Markov models (HMMs). Thissystem employs the HMM-based speech synthesis to synthesizesingingvoice. Musical information such aslyrics, tones, durationsis modeled simultaneously in a unified framework of the context-dependent HMM. It can mimic the voice quality and singing styleof the original singer. Results of a singing voice synthesis exper-iment show that the proposed system can synthesize smooth andnatural-sounding singing voice. Index Terms : singing voice synthesis, HMM, time-lag model. 1. Introduction In recent years, various applications of speech synthesis systemshave been proposed and investigated. Singing voice synthesis isone of the hot topics in this area [1–5]. However, only a fewcorpus-based singing voice synthesis systems which can be con-structed automatically have been proposed.Currently, there are two main paradigms in the corpus-basedspeech synthesis area: sample-based approach and statistical ap-proach. The sample-based approach such as unit selection [6]can synthesize high-quality speech. However, it requires a hugeamountoftrainingdatatorealizevariousvoicecharacteristics. Onthe other hand, the quality of statistical approach such as HMM-basedspeechsynthesis[7]isbuzzybecauseitisbasedonavocod-ingtechnique. However,itissmoothandstable,anditsvoicechar-acteristics can easily be modified by transforming HMM parame-ters appropriately. For singing voice synthesis, applying the unitselection seems to be difficult because a huge amount of singingspeech which covers vast combinations of contextual factors thataffect singing voice has to be recorded. On the other hand, theHMM-based system can be constructed using a relatively smallamount of training data. From this point of view, the HMM-basedapproach seems to be more suitable for the singing voice synthe-sizer. In the present paper, we apply the HMM-based synthesisapproach to singing voice synthesis.Although the singing voice synthesis system proposed in thepresent paper is quite similar to the HMM-based text-to-speechsynthesissystem[7],therearetwomaindifferencesbetweenthem.In the HMM-based text-to-speech synthesis system, contextualfactors which may affect reading speech (e.g. phonemes, sylla-bles, words, phrases, etc.) are taken into account. However, con-textual factors which may affect singing voice should be different

127 citations

Proceedings Article
01 Oct 2000
TL;DR: ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.
Abstract: ICSLP2000: the 6th International Conference on Spoken Language Processing, October 16-20, 2000, Beijing, China.

105 citations


Cited by
More filters
Proceedings Article
01 Jan 2011
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Abstract: We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

5,857 citations

Proceedings ArticleDOI
30 Mar 2018
TL;DR: In this article, a new open source platform for end-to-end speech processing named ESPnet is introduced, which mainly focuses on automatic speech recognition (ASR), and adopts widely used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine.
Abstract: This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.

806 citations

Journal ArticleDOI
TL;DR: An overview of HSMMs is presented, including modelling, inference, estimation, implementation and applications, which has been applied in thirty scientific and engineering areas, including speech recognition/synthesis, human activity recognition/prediction, handwriting recognition, functional MRI brain mapping, and network anomaly detection.

734 citations

Proceedings Article
01 Sep 2001
TL;DR: EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, 2001, Aalborg, Denmark.
Abstract: EUROSPEECH2001: the 7th European Conference on Speech Communication and Technology, September 3-7, 2001, Aalborg, Denmark.

592 citations