scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
15 Sep 2015
TL;DR: The authors use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45.
Abstract: In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated phonemes, which have been confused during phoneme recognition, into viseme groups. Using these maps, with the LiLIR dataset, we show the effect of changing the viseme map size in speaker-dependent machine lip-reading, measured by word recognition correctness and so demonstrate that word recognition with phoneme classifiers is not just possible, but often better than word recognition with viseme classifiers. Furthermore, there are intermediate units between visemes and phonemes which are better still.

13 citations

Book
20 Jan 2005
TL;DR: In this paper, three puzzles of multimodal speech perception are investigated: temporal organization of cued speech production, bimodal perception within the natural time-course of speech production and sensory information for face perception.
Abstract: 1. Three puzzles of multimodal speech perception R. E. Remez 2. Visual speech perception L. E. Bernstein 3. Dynamic information for face perception K. Lander and V. Bruce 4. Investigating auditory-visual speech perception development D. Burnham and K. Sekiyama 5. Brain bases for seeing speech: FMRI studies of speechreading R. Campbell and M. MacSweeney 6. Temporal organization of cued speech production D. Beautemps, M.-A. Cathiard, V. Attina and C. Savariaux 7. Bimodal perception within the natural time-course of speech production M.-A. Cathiard, A. Vilain, R. Laboissiere, H. Loevenbruck, C. Savariaux and J.-L. Schwartz 8. Visual and audiovisual synthesis and recognition of speech by computers N. M. Brooke and S. D. Scott 9. Audiovisual automatic speech recognition G. Potamianos, C. Neti, J. Luettin and I. Matthews 10. Image-based facial synthesis M. Slaney and C. Bregler 11. A trainable videorealistic speech animation system T. Ezzat, G. Geiger and T. Poggio 12. Animated speech: research progress and applications D. W. Massaro, M. M. Cohen, M. Tabain, J. Beskow and R. Clark 13. Empirical perceptual-motor linkage of multimodal speech E. Vatikiotis-Bateson and K. G. Munhall 14. Sensorimotor characteristics of speech production G. Bailly, P. Badin, L. Reveret and A. Ben Youssef.

13 citations

Journal ArticleDOI
TL;DR: In this paper, the accuracy in testing the articulation of speech sounds has been investigated in the context of speech audiology, and the results show that the accuracy of testing speech sounds depends on the level of articulation.
Abstract: (1938). Accuracy in Testing the Articulation of Speech Sounds. The Journal of Educational Research: Vol. 31, No. 5, pp. 348-356.

13 citations

Journal ArticleDOI
TL;DR: The results provide arguments for the involvement of the speech motor cortex in phonological discrimination, and suggest a multimodal representation of speech units.

13 citations

Proceedings ArticleDOI
14 Mar 2010
TL;DR: By appropriately modifying techniques that have been successful in audio language identification, the work to discriminating two languages in speaker-independent mode is extended, indicating that even with viseme accuracy as low as about 34%, reasonable discrimination can be obtained.
Abstract: We describe experiments in visual-only language identification (VLID), in which only lip shape, appearance and motion are used to determine the language of a spoken utterance. In previous work, we had shown that this is possible in speaker-dependent mode, i.e. identifying the language spoken by a multi-lingual speaker. Here, by appropriately modifying techniques that have been successful in audio language identification, we extend the work to discriminating two languages in speaker-independent mode. Our results indicate that even with viseme accuracy as low as about 34%, reasonable discrimination can be obtained. A simulation of degraded accuracy viseme recognition performance indicates that high VLID accuracy should be achievable with viseme recognition errors of the order of 50%.

13 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822