A German viseme-set for automatic transcription of input text used for audio-visual speech synthesis.

Open AccessProceedings Article

A German viseme-set for automatic transcription of input text used for audio-visual speech synthesis.

Christian Weiss, +1 more

- pp 2945-2948

Chats0

TLDR

A German viseme inventory for visemically transcribing text according to phonetic transcribtion is introduced and an inventory of German visemo classes in a SAMPA-like labelling is worked out and a model for automatic visemic transcription of given input text is trained.

Abstract:

In this paper, we introduce a German viseme inventory for visemically transcribing text according to phonetic transcribtion. A viseme set like the one presented in this work is essential for speech-driven audio-visual synthesis due to the fact that the selection of appropriate video segments is based on the visemically transcribed input text. For text-to-speech synthesis, a transcription of the input text into the phonemic representation is used, in order to avoid ambiguous meanings and to acquire the correct pronunciation of the underlying input text and to serve as labels in unitselection-based synthesis systems. Likewise, the visual synthesis requires a transcription that represents analogue to the phonemes the visual counterpart which is called viseme in related literature and which also serves as a unit label in our data-driven video-realistic audio-visual synthesis system. We worked out an inventory of German viseme classes in a SAMPA-like labelling and trained a model for automatic visemic transcription of given input text.

A German viseme-set for automatic transcription of input text used for audio-visual speech synthesis.

Citations

Animated virtual characters to explore audio-visual speech in controlled and naturalistic environments.

Weakly Supervised Automatic Transcription of Mouthings for Gloss-Based Sign Language Corpora

Handling multimodality and scarce resources in sign language machine translation

Avatars 4 all: an avatar generation toolchain

References

Hearing lips and seeing voices

A maximum entropy approach to natural language processing

Video Rewrite: driving visual speech with audio

Perceiving talking faces: from speech perception to a behavioral principle

Modeling Coarticulation in Synthetic Visual Speech

Related Papers (5)

Automatic Viseme Clustering for Audiovisual Speech Synthesis.

Text-To-Visual Speech in Chinese Based on Data-Driven Approach

A text-to-audiovisual-speech synthesizer for French

Development of an articulatory visual-speech synthesizer to support language learning

Progress in speech synthesis