Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

LIFI: Towards Linguistically Informed Frame Interpolation

[...]

Aradhya Neeraj Mathur¹, Devansh Batra¹, Yaman Kumar Singla¹, Rajiv Ratn Shah¹, Changyou Chen², Roger Zimmermann³ - Show less +2 more•Institutions (3)

Indraprastha Institute of Information Technology¹, University at Buffalo², National University of Singapore³

06 Jun 2021

TL;DR: In this article, a new set of linguistically informed metrics targeted explicitly to the problem of speech video interpolation is proposed. But despite high performance on conventional metrics like MSE, PSNR, and SSIM, they find that the state-of-the-art frame interpolation models fail to produce faithful speech interpolation.

...read moreread less

Abstract: Here we explore the problem of speech video interpolation. With close to 70% of web traffic, such content today forms the primary form of online communication and entertainment. Despite high performance on conventional metrics like MSE, PSNR, and SSIM, we find that the state-of-the-art frame interpolation models fail to produce faithful speech interpolation. For instance, we observe the lips stay static while the person is still speaking for most interpolated frames. With this motivation, using the information of words, sub-words, and visemes, we provide a new set of linguistically informed metrics targeted explicitly to the problem of speech video interpolation. We release several datasets to test video interpolation models of their speech understanding. We also design linguistically informed deep learning video interpolation algorithms to generate the missing frames.

...read moreread less

Book Chapter•DOI•

Correct speech visemes as a root of total communication method for deaf people

[...]

Eva Pajorová¹, Ladislav Hluchý¹•Institutions (1)

Slovak Academy of Sciences¹

25 Jun 2012

TL;DR: A design tool for creating correct speech visemes is designed and is testing the correctness of generated viseme on Slovak speech domains.

...read moreread less

Abstract: Many deaf people are using lip reading as a main communication fiorm. A viseme is a representational unit used to classify speech sounds in the visual domain and describes the particular facial and oral positions and movements that occur alongside the voicing of phonemes. A design tool for creating correct speech visemes is designed. It's composed of 5 modules; one module for creating phonemes, one module for creating 3D speech visemes, one module for facial expression and modul for synchronization between phonemes and visemes and lastly one module to generate speech triphones. We are testing the correctness of generated visemes on Slovak speech domains. The paper descriebes our developed tool.

...read moreread less

Patent•

Dual-viseme mouth shape synthesis method

[...]

Chen Huamou, Haifei Zhu, Li Peihong, Xu Chongming, Chen Hongkai - Show less +1 more

31 May 2019

TL;DR: In this paper, a dual-viseme mouth shape synthesis method was proposed for solving the technical problem of lower fidelity of a pronunciation mouth shape of a reproduction part of an existing mouth shape synthesizer.

...read moreread less

Abstract: The embodiment of the invention discloses a dual-viseme mouth shape synthesis method. The method is used for solving the technical problem of lower fidelity of a pronunciation mouth shape of a reproduction part of an existing mouth shape synthesis technology. The method comprises the steps that viseme classification is performed on the pronunciation in standard Chinese, and the pronunciation is classified into 13 categories; corresponding mouth shape videos are recorded according to the viseme classification; a basic mouth shape viseme library is established according to an original mouth shape video; the basic mouth shape viseme library is subjected to dual-viseme treatment to obtain a basic mouth shape dual-viseme library; a speech recognition technology is used for recognizing newly input speech to obtain a text material, after the text material is subjected to initial consonant or simple or compound vowel recognition, the mouth shape viseme corresponding to each initial consonant or simple or compound vowel is searched for in the basic mouth shape dual-viseme library, and the mouth shape visemes are inserted into corresponding points in time to form a discrete mouth shape sequence, and the discrete mouth shape sequence is subjected to smoothing to obtain a continuous mouth shape sequence.

...read moreread less

Book Chapter•DOI•

Visual Speech Processing and Recognition

[...]

U. B. Mahadevaswamy¹, M. Shashank Rao¹, S. Vrushab¹, C. Anagha¹, V. Sangameshwar¹ - Show less +1 more•Institutions (1)

Sri Jayachamarajendra College of Engineering¹

13 Feb 2020

TL;DR: In this paper, the lip-reading models are based on deep neural network architectures that capture temporal data which are created for the task of speech recognition.

...read moreread less

Abstract: Lip reading is the ability to understand what a person is communicating using just the video information. Due to the advent of Internet and computers, it is now possible to remove human intervention from lip reading. Such automation is only feasible because of a couple of developments in the field of computer vision: availability of a large-scale dataset for training and use of neural network models. The applications to this are numerous. From dictating messages to a device in a noisy environment to improving speech recognition in the current technologies, visual speech recognition has proved to be pivotal. In this paper, the lip-reading models are based on deep neural network architectures that capture temporal data which are created for the task of speech recognition.

...read moreread less

Proceedings Article•DOI•

An online speech driven talking head system

[...]

Kai Zhao¹, Zhiyong Wu¹, Jia Jia¹, Lianhong Cai¹•Institutions (1)

Tsinghua University¹

01 Nov 2012

TL;DR: This paper presents the design and implementation of an online speech driven talking head animation system that first recognizes phoneme sequence from the input speech with a Chinese Mandarin speech recognizer, and is used to drive the facial animations on a 3-dimentional talking head.

...read moreread less

Abstract: This paper presents the design and implementation of an online speech driven talking head animation system The system first recognizes phoneme sequence from the input speech with a Chinese Mandarin speech recognizer The phoneme sequence is further transformed to a sequence of visemes The sequence of MPEG-4 facial animation parameters (FAPs) is further derived from the viseme sequence, and is used to drive the facial animations on a 3-dimentional talking head The architecture and the major features are also presented in the paper, together with the evaluations of the system

...read moreread less

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics