Home
/
Topics
/
Viseme

Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1973
1972
1971
1970
1969
1968

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•DOI•

A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset

[...]

21 Jan 2023

TL;DR: In this article , the authors presented a large-scale audio-visual dataset for Persian, which consists of almost 220 hours of videos with 1760 corresponding speakers, which is suitable for automatic speech recognition, audio visual speech recognition and speaker recognition.

...read moreread less

Abstract: In recent years, significant progress has been made in automatic lip reading. But these methods require large-scale datasets that do not exist for many low-resource languages. In this paper, we have presented a new multipurpose audio-visual dataset for Persian. This dataset consists of almost 220 hours of videos with 1760 corresponding speakers. In addition to lip reading, the dataset is suitable for automatic speech recognition, audio-visual speech recognition, and speaker recognition. Also, it is the first large-scale lip reading dataset in Persian. A baseline method was provided for each mentioned task. In addition, we have proposed a technique to detect visemes (a visual equivalent of a phoneme) in Persian. The visemes obtained by this method increase the accuracy of the lip reading task by 7% relatively compared to the previously proposed visemes, which can be applied to other languages as well.

...read moreread less

Proceedings Article•DOI•

Fusion of audio and video data by neural networks for robust vowel recognition

[...]

Kristian Kroschel¹, M. S. Mekhaiel¹, Frédéric Berthommier•Institutions (1)

Karlsruhe Institute of Technology¹

01 Aug 1999

TL;DR: This paper presents an automatic vowel recognition system which can perform in quasi real-time and focus mainly on the recognition of 5 different German vowels and their corresponding visemes (images).

...read moreread less

Abstract: The performance of speech recognition systems decreases dramatically in noisy environments. A robust human-computer interaction system should therefore make use of both, acoustic and visual signals. In this paper we present an automatic vowel recognition system which can perform in quasi real-time. We focus mainly on the recognition of 5 different German vowels (a, e, i, o, u) and their corresponding visemes (images). First the position of the continuously moving face is determined. The speech parameters of the spoken vowel along with the model parameters of the lip's image are fed to a neural network to recognize the uttered vowel. The face tracking is shape-independent and hence no special requirements concerning the color or shape of the face are needed.

...read moreread less

Patent•DOI•

Speech recognition using biosignals

[...]

Joseph Desimone

05 Dec 1994-Journal of the Acoustical Society of America

TL;DR: The recognition rate of a speech recognition system is improved by compensating for changes in the user's speech that result from factors such as emotion, anxiety or fatigue.

...read moreread less

Abstract: The recognition rate of a speech recognition system is improved by compensating for changes in the user's speech that result from factors such as emotion, anxiety or fatigue. A speech signal derived from a user's utterance is modified by a preprocessor 32 and provided to a speech recognition system to improve the recognition rate. The speech signal is modified based on a bio-signal which is indicative of the user's emotional state.

...read moreread less

Journal Article•DOI•

Video Based Visual Speech Feature Model Construction

[...]

Xibin Jia¹, Mei Xia Zheng¹•Institutions (1)

Beijing University of Technology¹

01 Jun 2012-Applied Mechanics and Materials

TL;DR: The paper shows the HMM which describing the dynamic of speech, coupled with the combined feature for describing the global and local texture is the best model.

...read moreread less

Abstract: This paper aims to give a solutions for the construction of chinese visual speech feature model based on HMM. We propose and discuss three kind representation model of the visual speech which are lip geometrical features, lip motion features and lip texture features. The model combines the advantages of the local LBP and global DCT texture information together, which shows better performance than the single feature. Equally the model combines the advantages of the local LBP and geometrical information together is better than single feature. By computing the recognition rate of the visemes from the model, the paper shows the HMM which describing the dynamic of speech, coupled with the combined feature for describing the global and local texture is the best model.

...read moreread less

Proceedings Article•DOI•

Dynamic visual features based on discriminative speech class projection for visual speech recognition

[...]

Xie Lei, Cai Xiu-li, Fu Zhong-hua, Zhao Rongchun

20 Oct 2004

TL;DR: A dynamic visual feature extraction scheme to capture important lip motion information for visual speech recognition and shows that the proposed high discriminative dynamic features, when augmented to the static, yields superior recognition performance.

...read moreread less

Abstract: This paper presents a dynamic visual feature extraction scheme to capture important lip motion information for visual speech recognition Discriminative projections based on a-priori chosen speech classes, phonemes and visemes, are applied to the concatenation of pre-extracted static visual features First- and second-order temporal derivatives are subsequently extracted to further represent the dynamic differences Experiments on a connected digits task demonstrate that the proposed high discriminative dynamic features, when augmented to the static, yields superior recognition performance Compared to the commonly used delta and acceleration features, the proposed dynamic feature leads to an 8% absolute improvement in terms of word accuracy for the considered recognition task

...read moreread less

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
…
157
158
159
160
161
162
163
…
164
165
166
167
168
169
170
171
172
173
174
175
176
177

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics