Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

Pattern Recognition and Machine Learning

Data Mining Practical Machine Learning Tools and Techniques

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

The PASCAL Visual Object Classes Challenge

An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article.

The Self-Organizing Map

Vector quantisation and mixture densities hidden Markov models and basic algorithms continuous hidden Markov models unified theory with semi-continuous models using hidden Markov models for speech recognition experimental examples.

Hidden Markov Models for Speech Recognition

This paper presents a voice conversion technique using Deep Belief Nets (DBNs) to build high-order eigen spaces of the source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. DBNs have a deep architecture that automatically discovers abstractions to maximally express the original input features. If we train the DBNs using only the speech of an individual speaker, it can be considered that there is less phonological information and relatively more speaker individuality in the output features at the highest layer. Training the DBNs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NNs). The converted abstraction of the source speaker is then brought back to the cepstrum space using an inverse process of the DBNs of the target speaker. We conducted speakervoice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method.

/pdf/voice-conversion-in-high-order-eigen-space-using-deep-belief-2og9yjdeao.pdf

Voice Conversion in High-order Eigen Space Using Deep Belief Nets

This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars obtained from the input signal, and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.

/pdf/exemplar-based-voice-conversion-in-noisy-environment-3s8fk3380l.pdf

Exemplar-based voice conversion in noisy environment

We propose Gaussian Mixture Model (GMM)-based emotional voice conversion using spectrum and prosody features. In recent years, speech recognition and synthesis techniques have been developed, and an emotional voice conversion technique is required for synthesizing more exp ressive voices. The common emotional conversion was based on transformation of neutral prosody to emotional prosody by using huge speech corpus. In this paper, we convert a neutral voice to an emot ional vo ice using GMMs. GMM-based spectrum conversion is widely used to modify non linguistic informat ion such as voice characteristics while keep ing linguistic information unchanged. Because the conventional method converts either prosody or voice quality (spectrum), som e emot ions are not converted well. In our method, both prosody and voice quality are used for converting a neutral voice to an emotional voice, and it is able to obtain more expressive voices in comparison with conventional methods, such as prosody or spectrum conversion.

/pdf/gmm-based-emotional-voice-conversion-using-spectrum-and-1362vc7lv9.pdf

GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features

This paper presents a voice conversion (VC) method that utilizes the recently proposed probabilistic models called recurrent temporal restricted Boltzmann machines (RTRBMs). One RTRBM is used for each speaker, with the goal of capturing high-order temporal dependencies in an acoustic sequence. Our algorithm starts from the separate training of one RTRBM for a source speaker and another for a target speaker using speaker-dependent training data. Because each RTRBM attempts to discover abstractions to maximally express the training data at each time step, as well as the temporal dependencies in the training data, we expect that the models represent the linguistic-related latent features in high-order spaces. In our approach, we convert (match) features of emphasis for the source speaker to those of the target speaker using a neural network (NN), so that the entire network (consisting of the two RTRBMs and the NN) acts as a deep recurrent NN and can be fine-tuned. Using VC experiments, we confirm the high performance of our method, especially in terms of objective criteria, relative to conventional VC methods such as approaches based on Gaussian mixture models and on NNs.

Yasuo Ariki

Papers

Hidden Markov Models for Speech Recognition

Voice Conversion in High-order Eigen Space Using Deep Belief Nets

Exemplar-based voice conversion in noisy environment

GMM-Based Emotional Voice Conversion Using Spectrum and Prosody Features

Voice conversion using RNN pre-trained by recurrent temporal restricted boltzmann machines