Home
/
Topics
/
Viseme

Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1973
1972
1971
1970
1969
1968

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Realistic speech animation based on observed 3-D face dynamics

[...]

Pascal Müller¹, Gregor A. Kalberer¹, M. Proesmans¹, L. Van Gool¹•Institutions (1)

ETH Zurich¹

01 Aug 2005

TL;DR: An efficient system for realistic speech animation is proposed, which supports all steps of the animation pipeline, from the capture or design of 3-D head models up to the synthesis and editing of the performance.

...read moreread less

Abstract: An efficient system for realistic speech animation is proposed. The system supports all steps of the animation pipeline, from the capture or design of 3-D head models up to the synthesis and editing of the performance. This pipeline is fully 3-D, which yields high flexibility in the use of the animated character. Real detailed 3-D face dynamics, observed at video frame rate for thousands of points on the face of speaking actors, underpin the realism of the facial deformations. These are given a compact and intuitive representation via independent component analysis (ICA). Performances amount to trajectories through this ‘viseme space’. When asked to animate a face the system replicates the ‘visemes’ that it has learned, and adds the necessary co-articulation effects. Realism has been improved through comparisons with motion captured groundtruth. Faces for which no 3-D dynamics could be observed can be animated nonetheless. Their visemes are adapted automatically to their physiognomy by localising the face in a ‘face space’.

...read moreread less

22 citations

Dissertation•

Audio-visual Speech Processing

[...]

Simon Lucey

01 Jan 2002

22 citations

Proceedings Article•DOI•

Frame rate and viseme analysis for multimedia applications

[...]

J.J. Williams¹, J.C. Rutledge¹, Dean C. Garstecki¹, Aggelos K. Katsaggelos•Institutions (1)

Northwestern University¹

23 Jun 1997

TL;DR: The perceptual boundaries of speech reading and multimedia technology, which are the constraints that effect speech reading performance, are investigated and conclusions on the relationship between viseme groupings, accuracy of viseme recognition, and presentation rate are drawn.

...read moreread less

Abstract: In the future, multimedia technology will be able to provide video frame rates equal to or better than 30 frames per second (FPS). Until that time the hearing impaired community will be using band limited communication systems over unshielded twisted pair copper wiring. As a result, multimedia communication systems will use a coder/decoder (CODEC) to compress the video and audio signals for transmission. For these systems to be usable by the hearing impaired community, the algorithms within the CODEC have to be designed to account for the perceptual boundaries of the hearing impaired. We investigate the perceptual boundaries of speech reading and multimedia technology, which are the constraints that effect speech reading performance. We analyze and draw conclusions on the relationship between viseme groupings, accuracy of viseme recognition, and presentation rate. These results are critical in the design of multimedia systems for the hearing impaired.

...read moreread less

22 citations

Audio-visual speech synthesis for Finnish

[...]

Jean-Luc Olives, Riikka Möttönen, J. Kulju, Mikko Sams

01 Jan 1999

TL;DR: A three dimensional facial model with a commercial audio text-to-speech synthesizer and the confusion patterns of consonants and the identification of the Finnish visemes were examined to examine the intelligibility of both natural and synthetic auditory speech.

...read moreread less

Abstract: We describe our Finnish audio-visual speech synthesizer, its evaluation and discuss possible improvements. We have combined a three dimensional facial model with a commercial audio text-to-speech synthesizer. The visual speech is based on a letter-to-viseme mapping and the animation is created by linear interpolation between the visemes. An intelligibility test was run to quantify the benefit of seeing the synthetic and natural face on hearing the synthetic and natural voice presented at different signal to noise ratios. Both natural and synthetic faces improved the intelligibility of both natural and synthetic auditory speech. We examined the confusion patterns of consonants and the identification of the Finnish visemes. We also propose how the viseme repertoire of the talking head can be improved.

...read moreread less

22 citations

Journal Article•

Comparing phonemes and visemes with DNN-based lipreading

[...]

Kwanchiva Thangthai, Helen L. Bear, Richard P. Harvey

01 Sep 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The phoneme lipreading system word accuracy outperforms the viseme based system word word accuracy, however, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.

...read moreread less

Abstract: There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is aWeighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.

...read moreread less

22 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
…
32
33
34
35
36
37
38
…
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics