Home
/
Topics
/
Viseme

Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1973
1972
1971
1970
1969
1968

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

VisemeNet: Audio-Driven Animator-Centric Speech Animation

[...]

Yang Zhou¹, Zhan Xu¹, Chris Landreth², Evangelos Kalogerakis¹, Subhransu Maji¹, Karan Singh² - Show less +2 more•Institutions (2)

University of Massachusetts Amherst¹, University of Toronto²

24 May 2018-arXiv: Graphics

TL;DR: In this article, a three-stage Long Short-Term Memory (LSTM) network architecture is proposed to produce animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio.

...read moreread less

Abstract: We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.

...read moreread less

29 citations

A new trainable trajectory formation system for facial animation

[...]

Oxana Govokhina, Gérard Bailly, Gaspard Breton, Paul C. Bagshaw

28 Aug 2006

TL;DR: In this paper, a trainable trajectory formation system for facial animation is proposed that dissociates parametric spaces and methods for movement planning and execution, achieved by HMM-based trajectory formation.

...read moreread less

Abstract: A new trainable trajectory formation system for facial animation is here proposed that dissociates parametric spaces and methods for movement planning and execution. Movement planning is achieved by HMM-based trajectory formation. Movement execution is performed by concatenation of multi-represented diphones. Movement planning ensures that the essential visual characteristics of visemes are reached (lip closing for bilabials, rounding and opening for palatal fricatives, etc) and that appropriate coarticulation is planned. Movement execution grafts phonetic details and idiosyncratic articulatory strategies (dissymetries, importance of jaw movements, etc) to the planned gestural score.

...read moreread less

28 citations

Proceedings Article•

Stochastic perceptual auditory-event-based models for speech recognition.

[...]

Nelson Morgan, Hervé Bourlard, Steven Greenberg, Hynek Hermansky

01 Jan 1994

TL;DR: A statistical model of speech is developed that incorporates certain temporal properties of human speech perception that may in principle allow for statistical modeling of speech components that are more relevant for discrimination between candidate utterances during speech recognition.

...read moreread less

Abstract: We have developed a statistical model of speech that incorporates certain temporal properties of human speech perception. The primary goal of this work is to avoid a number of current constraining assumptions for statistical speech recognition systems, particularly the model of speech as a sequence of stationary segments consisting of uncorrelated acoustic vectors. A focus on perceptual models may in principle allow for statistical modeling of speech components that are more relevant for discrimination between candidate utterances during speech recognition. In particular, we hope to develop systems that have some of the robust properties of human audition for speech collected under adverse conditions. The outline of this new research direction is given here, along with some preliminary theoretical work.

...read moreread less

28 citations

Journal Article•DOI•

Speech about Speech in Speech about Action

[...]

Greg Urban

01 Jul 1984-Journal of American Folklore

28 citations

Journal Article•DOI•

Discriminative learning of visual data for audiovisual speech recognition

[...]

Alexandrina Rogozan

01 Mar 1999-International Journal on Artificial Intelligence Tools

TL;DR: Methods for reinforcing the visible speech recognition in the framework of separate identification are outlined and it is shown that using these methods improves performances of the DI+SI based system under varying noise-level conditions.

...read moreread less

Abstract: In recent years a number of techniques have been proposed to improve the accuracy and the robustness of automatic speech recognition in noisy environments. Among these, suplementing the acoustic information with visual data, mostly extracted from speaker's lip shapes, has been proved to be successful. We have already demonstrated the effectiveness of integrating visual data at two different levels during speech decoding according to both direct and separate identification strategies (DI+SI). This paper outlines methods for reinforcing the visible speech recognition in the framework of separate identification. First, we define visual-specific units using a self-organizing mapping technique. Second, we complete a stochastic learning of these units with a discriminative neural-network-based technique for speech recognition purposes. Finally, we show on a connected-letter speech recognition task that using these methods improves performances of the DI+SI based system under varying noise-level conditions.

...read moreread less

28 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
…
25
26
27
28
29
30
31
…
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics