scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Proceedings ArticleDOI
06 Jun 2021
TL;DR: In this article, a new set of linguistically informed metrics targeted explicitly to the problem of speech video interpolation is proposed. But despite high performance on conventional metrics like MSE, PSNR, and SSIM, they find that the state-of-the-art frame interpolation models fail to produce faithful speech interpolation.
Abstract: Here we explore the problem of speech video interpolation. With close to 70% of web traffic, such content today forms the primary form of online communication and entertainment. Despite high performance on conventional metrics like MSE, PSNR, and SSIM, we find that the state-of-the-art frame interpolation models fail to produce faithful speech interpolation. For instance, we observe the lips stay static while the person is still speaking for most interpolated frames. With this motivation, using the information of words, sub-words, and visemes, we provide a new set of linguistically informed metrics targeted explicitly to the problem of speech video interpolation. We release several datasets to test video interpolation models of their speech understanding. We also design linguistically informed deep learning video interpolation algorithms to generate the missing frames.
Book ChapterDOI
25 Jun 2012
TL;DR: A design tool for creating correct speech visemes is designed and is testing the correctness of generated viseme on Slovak speech domains.
Abstract: Many deaf people are using lip reading as a main communication fiorm. A viseme is a representational unit used to classify speech sounds in the visual domain and describes the particular facial and oral positions and movements that occur alongside the voicing of phonemes. A design tool for creating correct speech visemes is designed. It's composed of 5 modules; one module for creating phonemes, one module for creating 3D speech visemes, one module for facial expression and modul for synchronization between phonemes and visemes and lastly one module to generate speech triphones. We are testing the correctness of generated visemes on Slovak speech domains. The paper descriebes our developed tool.
Patent
31 May 2019
TL;DR: In this paper, a dual-viseme mouth shape synthesis method was proposed for solving the technical problem of lower fidelity of a pronunciation mouth shape of a reproduction part of an existing mouth shape synthesizer.
Abstract: The embodiment of the invention discloses a dual-viseme mouth shape synthesis method. The method is used for solving the technical problem of lower fidelity of a pronunciation mouth shape of a reproduction part of an existing mouth shape synthesis technology. The method comprises the steps that viseme classification is performed on the pronunciation in standard Chinese, and the pronunciation is classified into 13 categories; corresponding mouth shape videos are recorded according to the viseme classification; a basic mouth shape viseme library is established according to an original mouth shape video; the basic mouth shape viseme library is subjected to dual-viseme treatment to obtain a basic mouth shape dual-viseme library; a speech recognition technology is used for recognizing newly input speech to obtain a text material, after the text material is subjected to initial consonant or simple or compound vowel recognition, the mouth shape viseme corresponding to each initial consonant or simple or compound vowel is searched for in the basic mouth shape dual-viseme library, and the mouth shape visemes are inserted into corresponding points in time to form a discrete mouth shape sequence, and the discrete mouth shape sequence is subjected to smoothing to obtain a continuous mouth shape sequence.
Book ChapterDOI
13 Feb 2020
TL;DR: In this paper, the lip-reading models are based on deep neural network architectures that capture temporal data which are created for the task of speech recognition.
Abstract: Lip reading is the ability to understand what a person is communicating using just the video information. Due to the advent of Internet and computers, it is now possible to remove human intervention from lip reading. Such automation is only feasible because of a couple of developments in the field of computer vision: availability of a large-scale dataset for training and use of neural network models. The applications to this are numerous. From dictating messages to a device in a noisy environment to improving speech recognition in the current technologies, visual speech recognition has proved to be pivotal. In this paper, the lip-reading models are based on deep neural network architectures that capture temporal data which are created for the task of speech recognition.
Proceedings ArticleDOI
01 Nov 2012
TL;DR: This paper presents the design and implementation of an online speech driven talking head animation system that first recognizes phoneme sequence from the input speech with a Chinese Mandarin speech recognizer, and is used to drive the facial animations on a 3-dimentional talking head.
Abstract: This paper presents the design and implementation of an online speech driven talking head animation system The system first recognizes phoneme sequence from the input speech with a Chinese Mandarin speech recognizer The phoneme sequence is further transformed to a sequence of visemes The sequence of MPEG-4 facial animation parameters (FAPs) is further derived from the viseme sequence, and is used to drive the facial animations on a 3-dimentional talking head The architecture and the major features are also presented in the paper, together with the evaluations of the system

Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822