Home
/
Topics
/
Viseme

Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1973
1972
1971
1970
1969
1968

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Automatic lip sync and its use in the new multimedia services for mobile devices

[...]

Goranka Zoric, Igor S. Pandzic¹•Institutions (1)

University of Zagreb¹

15 Jun 2005

TL;DR: A new method for mapping natural speech to lip shape animation in real time using neural networks that eliminates the need for tedious manual neural network design by trial and error and considerably improves the viseme classification results.

...read moreread less

Abstract: In this paper we present a new method for mapping natural speech to lip shape animation in real time. The speech signal, represented by MFCC vectors, is classified into viseme classes using neural networks. The topology of neural networks is automatically configured using genetic algorithms. This eliminates the need for tedious manual neural network design by trial and error and considerably improves the viseme classification results. This method is available in real-time and offline mode, and is suitable for various applications. So, we propose the new multimedia services for mobile devices based on the lip sync system described.

...read moreread less

6 citations

Journal Article•DOI•

Multimodal translation system using texture-mapped lip-sync images for video mail and automatic dubbing applications

[...]

Shigeo Morishima¹, Satoshi Nakamura•Institutions (1)

Waseda University¹

01 Jan 2004-EURASIP Journal on Advances in Signal Processing

TL;DR: An automatic multimodal translation can be achieved that is suitable for video mail or automatic dubbing systems into other languages and substitute only the speech organ's image with the synthesized one, which is made by a 3D wire-frame model that is adaptable to any speaker.

...read moreread less

Abstract: We introduce a multimodal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion by synchronizing it to the translated speech. This system also introduces both a face synthesis technique that can generate any viseme lip shape and a face tracking technique that can estimate the original position and rotation of a speaker's face in an image sequence. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a 3D wire-frame model that is adaptable to any speaker. Our approach provides translated image synthesis with an extremely small database. The tracking motion of the face from a video image is performed by template matching. In this system, the translation and rotation of the face are detected by using a 3D personal face model whose texture is captured from a video frame. We also propose a method to customize the personal face model by using our GUI tool. By combining these techniques and the translated voice synthesis technique, an automatic multimodal translation can be achieved that is suitable for video mail or automatic dubbing systems into other languages.

...read moreread less

6 citations

Learning-Based Approach to Estimation of Morphable Model Parameters

[...]

Vinay P. Kumar, Tomaso Poggio

01 Sep 2000

TL;DR: The robust technique of Support VectorMachines is applied for learning a regression function from a sparse subset of Haar coeﬃcients to the LMM parameters to bypass current computationally intensive methods for matching objects to morphable models.

...read moreread less

Abstract: Thispaperdescribesamethodforestimatingtheparametersofalinearmorphablemodel(LMM)that models mouth images The method uses a learning-based approach to estimate the LMMparameters directly from the images of the object class (in this case mouths) Thus this methodcan be used to bypass current computationally intensive methods that use analysis by synthesis,for matching objects to morphable models We have used the invariance properties of Haarwavelets for representing mouth images We apply the robust technique of Support VectorMachines (SVM) for learning a regression function from a sparse subset of Haar coeﬃcients tothe LMM parameters The estimation of LMM parameters could possibly have application toother problems in vision We investigate one such application, namely viseme recognition

...read moreread less

6 citations

Book Chapter•DOI•

Audiovisual Speech Processing: Audiovisual automatic speech recognition

[...]

G. Potamianos, C. Neti, Juergen Luettin, Iain Matthews

01 Apr 2012

6 citations

Proceedings Article•DOI•

Pre-initialized composition for large-vocabulary speech recognition.

[...]

Cyril Allauzen¹, Michael Riley¹•Institutions (1)

Google¹

25 Aug 2013

TL;DR: A modified composition algorithm is described that is used for combining two finite-state transducers, representing the context-dependent lexicon and the language model respectively, in large vocabulary speech recogntion.

...read moreread less

Abstract: This paper describes a modified composition algorithm that is used for combining two finite-state transducers, representing the context-dependent lexicon and the language model respectively, in large vocabulary speech recogntion. This algorithm is a hybrid between the static and dynamic expansion of the resultant transducer, which maps from context-dependent phones to words and is searched during decoding. The approach is to pre-compute part of the recognition transducer and leave the balance to be expanded during decoding. This method allows for a fine-grained trade-off between space and time in recognition. For example, the time overhead of purely dynamic expansion can be reduced by over six-fold with only a 20% increase in memory in a collection of large-vocabulary recognition tasks available on the Google Android platform.

...read moreread less

6 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
…
81
82
83
84
85
86
87
…
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics