T
Thomas Merritt
Researcher at Amazon.com
Publications - 33
Citations - 607
Thomas Merritt is an academic researcher from Amazon.com. The author has contributed to research in topics: Speech synthesis & Hidden Markov model. The author has an hindex of 12, co-authored 33 publications receiving 465 citations. Previous affiliations of Thomas Merritt include University of Edinburgh.
Papers
More filters
Proceedings ArticleDOI
Towards achieving robust universal neural vocoding
Jaime Lorenzo-Trueba,Thomas Drugman,Javier Latorre,Thomas Merritt,Bartosz Putrycz,Roberto Barra-Chicote,Alexis Moinet,Vatsal Aggarwal +7 more
TL;DR: A WaveRNN-based vocoder is shown to be capable of generating speech of consistently good quality regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality.
Proceedings ArticleDOI
From HMMS to DNNS: Where do the improvements come from?
TL;DR: It is found that replacing decision trees with DNNs and moving from state-level to frame-level predictions both significantly improve listeners' naturalness ratings of synthetic speech produced by the systems.
Proceedings ArticleDOI
Deep neural network-guided unit selection synthesis
TL;DR: This paper demonstrates that the superiority of Deep Neural Network (DNN) acoustic models over HMMs in conventional statistical parametric speech synthesis also carries over to hybrid synthesis.
Proceedings ArticleDOI
Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
TL;DR: Subjective listening tests show that taking the source and filter parameters to be conditionally independent, or using diagonal covariance matrices, significantly limits the naturalness that can be achieved.
Posted Content
Towards achieving robust universal neural vocoding
Jaime Lorenzo-Trueba,Thomas Drugman,Javier Latorre,Thomas Merritt,Bartosz Putrycz,Roberto Barra-Chicote,Alexis Moinet,Vatsal Aggarwal +7 more
TL;DR: The authors trained a WaveRNN-based vocoder on 74 speakers coming from 17 languages and found that the results were consistent across languages, regardless of them being seen during training or unseen (e.g. Wolof, Swahili, Ahmaric).