scispace - formally typeset
R

Roberto Barra-Chicote

Researcher at Amazon.com

Publications -  66
Citations -  984

Roberto Barra-Chicote is an academic researcher from Amazon.com. The author has contributed to research in topics: Speech synthesis & Computer science. The author has an hindex of 17, co-authored 58 publications receiving 707 citations. Previous affiliations of Roberto Barra-Chicote include Technical University of Madrid.

Papers
More filters
Proceedings ArticleDOI

Improvements to Prosodic Alignment for Automatic Dubbing

TL;DR: In this paper, the prosodic alignment component of the dubbing architecture is improved and compared to previous work, the enhanced prosodic alignments significantly improve prosodic accuracy and provide segmentation perceptibly better or on par with manually annotated reference segmentation.
Proceedings ArticleDOI

From Speech-to-Speech Translation to Automatic Dubbing

TL;DR: In this paper, the authors present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing of TED Talks from English into Italian, which measure the perceived naturalness of automatic dubbed and the relative importance of each proposed enhancement.
Posted Content

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech.

TL;DR: The proposed Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second provides a 22% KL-divergence reduction while jointly improving perceptual metrics over state-of-the-art.
Book ChapterDOI

Towards Cross-Lingual Emotion Transplantation

TL;DR: The aim is to lean the nuances of emotional speech in a source language for which there is enough data to adapt an acceptable quality emotional model by means of CSMAPLR adaptation, and then convert the adaptation function so it can be applied to a target language in a different target speaker while maintaining the speaker identity but adding emotional information.
Proceedings ArticleDOI

Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech

TL;DR: This article proposed a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second, which enhances the disentanglement capabilities of a state-of-the-art sequence-tosequence based system with a VAE and a Householder Flow.