Y
Yasuo Ariki
Researcher at Kobe University
Publications - 340
Citations - 3838
Yasuo Ariki is an academic researcher from Kobe University. The author has contributed to research in topics: Feature extraction & Speaker recognition. The author has an hindex of 25, co-authored 337 publications receiving 3554 citations. Previous affiliations of Yasuo Ariki include University of Edinburgh & Ryukoku University.
Papers
More filters
Book
Hidden Markov Models for Speech Recognition
TL;DR: In this article, the authors unified theory with semi-continuous models using hidden Markov models for speech recognition experimental examples, using vector quantization and mixture densities hidden markov models.
Proceedings ArticleDOI
Voice Conversion in High-order Eigen Space Using Deep Belief Nets
TL;DR: This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space.
Proceedings ArticleDOI
Exemplar-based voice conversion in noisy environment
TL;DR: A voice conversion technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal, which is confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
Journal ArticleDOI
GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features
TL;DR: Both prosody and voice quality are used for converting a neutral voice to an emotional voice, and it is able to obtain more expressive voices in comparison with conventional methods, such as prosody or spectrum conversion.
Journal ArticleDOI
Voice conversion using RNN pre-trained by recurrent temporal restricted boltzmann machines
TL;DR: This paper presents a voice conversion method that utilizes the recently proposed probabilistic models called recurrent temporal restricted Boltzmann machines (RTRBMs), and converts features of emphasis for the source speaker to those of the target speaker using a neural network (NN), so that the entire network acts as a deep recurrent NN and can be fine-tuned.