VoCo: text-based insertion and replacement in audio narration

doi:10.1145/3072959.3073702

Journal ArticleDOI

VoCo: text-based insertion and replacement in audio narration

Zeyu Jin, +4 more

- Vol. 36, Iss: 4, pp 96

Chats0

TLDR

This paper presents a system that can synthesize a new word or short phrase such that it blends seamlessly in the context of the existing narration, using a text to speech synthesizer to say the word in a generic voice, and then using voice conversion to convert it into a voice that matches the narration.

Abstract:

Editing audio narration using conventional software typically involves many painstaking low-level manipulations. Some state of the art systems allow the editor to work in a text transcript of the narration, and perform select, cut, copy and paste operations directly in the transcript; these operations are then automatically applied to the waveform in a straightforward manner. However, an obvious gap in the text-based interface is the ability to type new words not appearing in the transcript, for example inserting a new word for emphasis or replacing a misspoken word. While high-quality voice synthesizers exist today, the challenge is to synthesize the new word in a voice that matches the rest of the narration. This paper presents a system that can synthesize a new word or short phrase such that it blends seamlessly in the context of the existing narration. Our approach is to use a text to speech synthesizer to say the word in a generic voice, and then use voice conversion to convert it into a voice that matches the narration. Offering a range of degrees of control to the editor, our interface supports fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and even guidance by the editors own voice. The paper presents studies showing that the output of our method is preferred over baseline methods and often indistinguishable from the original voice.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Text-based editing of talking-head video

Ohad Fried, +9 more

- 12 Jul 2019 -

ACM Transactions on Graphics

TL;DR: This work proposes a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts).

...read moreread less

Posted Content

Text-based Editing of Talking-head Video

Ohad Fried, +9 more

- 04 Jun 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposed a method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts).

...read moreread less

Proceedings ArticleDOI

Fftnet: A Real-Time Speaker-Dependent Neural Vocoder

Zeyu Jin, +3 more

TL;DR: FFTNet offers two improvements over WaveNet, substantially faster, allowing for real-time synthesis of audio waveforms, and when used as a vocoder, the resulting speech sounds more natural, as measured via a “mean opinion score” test.

...read moreread less

Proceedings Article

Fitting New Speakers Based on a Short Untranscribed Sample

Eliya Nachmani, +3 more

TL;DR: In this article, an additional network that given an audio sample, places the speaker in the embedding space is trained as part of the speech synthesis system using various consistency losses, and the results demonstrate a greatly improved performance on both the dataset speakers, and more importantly when fitting new voices, even from very short samples.

...read moreread less

Journal ArticleDOI

Anticipating and addressing the ethical implications of deepfakes in the context of elections

Nicholas Diakopoulos, +1 more

- 01 Jul 2021 -

New Media & Society

TL;DR: The ethical issues raised by deepfakes are examined and four potential forms of intervention are discussed with respect to multi-stakeholder responsibility for addressing harms, including education and media literacy, subject defense, verification, and publicity moderation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Posted Content

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 10 Sep 2014 -

arXiv: Computation and Language

TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

...read moreread less

Journal ArticleDOI

An algorithm for the machine calculation of complex Fourier series

J.W. Cooley, +1 more

- 01 Apr 1965 -

Mathematics of Computation

TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.

...read moreread less

Journal ArticleDOI

Amazon's Mechanical Turk A New Source of Inexpensive, Yet High-Quality, Data?

Michael D. Buhrmester, +2 more

- 03 Feb 2011 -

Perspectives on Psychological Science

TL;DR: Findings indicate that MTurk can be used to obtain high-quality data inexpensively and rapidly and the data obtained are at least as reliable as those obtained via traditional methods.

...read moreread less