scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
01 Jan 2004
TL;DR: AVOZES is the first publicly available audio-video speech data corpus for Australian English and contains recordings from 20 speakers and the sequences provide both a systematic coverage of the phonemes and visemes of Australian English as well as some application-driven utterances.
Abstract: The AVOZES data corpus has recently been made publicly available for other interested researchers. It is the first publicly available audio-video speech data corpus for Australian English. It contains recordings from 20 speakers and the sequences provide both a systematic coverage of the phonemes and visemes of Australian English as well as some application-driven utterances. AVOZES is also the first audio-video speech data corpus with stereo-video recordings, which enable a more accurate measurement of geometric facial features.

10 citations

Journal ArticleDOI
TL;DR: A detailed objective evaluation shows that a combined dynamic viseme-phoneme speech unit combined with a many-to-many encoder-decoder architecture models visual co-articulations effectively and outperforms significantly conventional phoneme-driven speech animation systems.

10 citations

Proceedings ArticleDOI
29 Mar 2018
TL;DR: This paper presents a detailed study of the machine learning approach for the real-time visual recognition of spoken words and nine different classifiers have been implemented and tested, reporting their confusion matrices among different groups of words.
Abstract: Lipreading is the process of interpreting spoken word by observing lip movement. It plays a vital role in human communication and speech understanding, especially for hearing-impaired individuals. Automated lipreading approaches have recently been used in such applications as biometric identification, silent dictation, forensic analysis of surveillance camera capture, and communication with autonomous vehicles. However, lipreading is a difficult process that poses several challenges to human- and machine-based approaches alike. This is due to the large number of phonemes in human language that are visually represented by a smaller number of lip movements (visemes). Consequently, the same viseme may be used to represent several phonemes, which confuses any lipreader. In this paper, we present a detailed study of the machine learning approach for the real-time visual recognition of spoken words. Our focus on real-time performance is motivated by the recent trend of using lipreading in autonomous vehicles. In this paper, machine learning approaches are applied to recognize lip-reading and nine different classifiers has been implemented and tested, reporting their confusion matrices among different groups of words. The classification process went on more than one classifier but these three classifiers got the best results which are GradientBoosting, Support Vector Machine(SVM) and logistic regression with results 64.7%, 63.5% and 59.4% respectively.

10 citations

Proceedings ArticleDOI
26 Sep 2010
TL;DR: An automated method that predicts the word accuracy of a speech recognition system for non-native speech, in the context of speaking proficiency scoring, showed promising performance by themselves, and improved the overall performance in tandem with other more traditional features.
Abstract: We have developed an automated method that predicts the word accuracy of a speech recognition system for non-native speech, in the context of speaking proficiency scoring. A model was trained using features based on speech recognizer scores, function word distributions, prosody, background noise, and speaking fluency. Since the method was implemented for non-native speech, fluency features, which have been used for non-native speakers’ proficiency scoring, were implemented along with several feature groups used from past research. The fluency features showed promising performance by themselves, and improved the overall performance in tandem with other more traditional features. A model using stepwise regression achieved a correlation with word accuracy rates of 0.76, compared to a baseline of 0.63 using only confidence scores. A binary classifier for placing utterances in high-or low-word accuracy bins achieved an accuracy of 84%, compared to a majority class baseline of 64%.

10 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822