scispace - formally typeset
Y

Yasuo Ariki

Researcher at Kobe University

Publications -  340
Citations -  3838

Yasuo Ariki is an academic researcher from Kobe University. The author has contributed to research in topics: Feature extraction & Speaker recognition. The author has an hindex of 25, co-authored 337 publications receiving 3554 citations. Previous affiliations of Yasuo Ariki include University of Edinburgh & Ryukoku University.

Papers
More filters
Book

Hidden Markov Models for Speech Recognition

TL;DR: In this article, the authors unified theory with semi-continuous models using hidden Markov models for speech recognition experimental examples, using vector quantization and mixture densities hidden markov models.
Proceedings ArticleDOI

Voice Conversion in High-order Eigen Space Using Deep Belief Nets

TL;DR: This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space.
Proceedings ArticleDOI

Exemplar-based voice conversion in noisy environment

TL;DR: A voice conversion technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal, which is confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
Journal ArticleDOI

GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features

TL;DR: Both prosody and voice quality are used for converting a neutral voice to an emotional voice, and it is able to obtain more expressive voices in comparison with conventional methods, such as prosody or spectrum conversion.
Journal ArticleDOI

Voice conversion using RNN pre-trained by recurrent temporal restricted boltzmann machines

TL;DR: This paper presents a voice conversion method that utilizes the recently proposed probabilistic models called recurrent temporal restricted Boltzmann machines (RTRBMs), and converts features of emphasis for the source speaker to those of the target speaker using a neural network (NN), so that the entire network acts as a deep recurrent NN and can be fine-tuned.