Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

Corrective feedback, emphatic speech synthesis, visual-speech exaggeration, pronunciation learning.

[...]

Yaohua Bu, Weijun Li, Tianyi Ma, Shengqi Chen, Jia Jia, Kun Li, Xiaobo Lu - Show less +3 more

12 Sep 2020

TL;DR: This work proposes a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT) that outperforms non-exaggerated version on helping learners with pronunciation identification and pronunciation improvement.

...read moreread less

Abstract: To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT). The speech exaggeration is realized by an emphatic speech generation neural network based on Tacotron, while the visual exaggeration is accomplished by ADC Viseme Blending, namely increasing Amplitude of movement, extending the phone's Duration and enhancing the color Contrast. User studies show that exaggerated feedback outperforms non-exaggerated version on helping learners with pronunciation identification and pronunciation improvement.

...read moreread less

Patent•

Adapting a speech recognition model

[...]

Xu Haitan, K. K. Chin¹•Institutions (1)

Toshiba¹

07 Apr 2010

TL;DR: In this paper, a method and apparatus for adapting a pattern recognition model, specifically a speech recognition model between first and second environments is presented, where a model for performing pattern recognition on an inputted sequence of observations is initially provided, the model having been trained to recognise a pattern in a second clean noise environment.

...read moreread less

Abstract: The invention provides a method and apparatus for adapting a pattern recognition model, specifically a speech recognition model, between first and second environments. A model for performing pattern recognition on an inputted sequence of observations is initially provided, the model having been trained to recognise a pattern in a second clean noise environment. The model has a plurality of parameters relating to the probability distribution of a component of a pattern being related to an observation. The model is then adapted S59 to a first noise environment using inputted observations S51 from the first noise environment. Adapting the model trained in the second environment to that of the first environment comprises using second order or higher Taylor expansion coefficients derived for a group of probability distributions an wherein the same expansion is used for the whole group. The groups may be regression classes. In order to recognise speech a language model is also used with the combined likelihoods of the adapted model and language model used to output a sequence of words identified from the input signal of the first noise environment.

...read moreread less

Book Chapter•DOI•

Application of Granular Computing-Based Pre-processing in the Labelling of Phonemes

[...]

Negin Ashrafi¹, Sheela Ramanna¹•Institutions (1)

University of Winnipeg¹

01 Jan 2021

TL;DR: Preliminary results of applying rough sets in pre-processing video frames (with lip markers) of spoken corpus in an effort to label the phonemes spoken by the speakers show promise in the application of a granular computing method for pre- processing large audio-video datasets.

...read moreread less

Abstract: Machine learning algorithms are increasingly effective in algorithmic viseme recognition which is a main component of audio-visual speech recognition (AVSR). A viseme is the smallest recognizable unit correlated with a particular realization of a given phoneme. Labelling of phonemes and assigning them to viseme classes is a challenging problem in AVSR. In this paper, we present preliminary results of applying rough sets in pre-processing video frames (with lip markers) of spoken corpus in an effort to label the phonemes spoken by the speakers. The problem addressed here is to detect and remove frames in which the shape of the lips do not represent a phoneme completely. Our results demonstrate that the silhouette score improves with rough set-based pre-processing using the unsupervised K-means clustering method. In addition, an unsupervised CNN model for feature extraction was used as input to the K-means clustering method. The results show promise in the application of a granular computing method for pre-processing large audio-video datasets.

...read moreread less

Posted Content•

Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis

[...]

Ingmar Steiner¹, Korin Richmond², Slim Ouni³•Institutions (3)

Trinity College, Dublin¹, University of Edinburgh², French Institute for Research in Computer Science and Automation³

22 Sep 2012-arXiv: Human-Computer Interaction

TL;DR: This work states that using appropriate speech production data could significantly improve the quality of articulatory animation for AV synthesis.

...read moreread less

Abstract: The importance of modeling speech articulation for high-quality audiovisual (AV) speech synthesis is widely acknowledged. Nevertheless, while state-of-the-art, data-driven approaches to facial animation can make use of sophisticated motion capture techniques, the animation of the intraoral articulators (viz. the tongue, jaw, and velum) typically makes use of simple rules or viseme morphing, in stark contrast to the otherwise high quality of facial modeling. Using appropriate speech production data could significantly improve the quality of articulatory animation for AV synthesis.

...read moreread less

Proceedings Article•DOI•

Alternative speech communication based on Cued Speech

[...]

Panikos Heracleous¹, Noureddine Aboutabit¹, Denis Beautemps¹•Institutions (1)

Stendhal University¹

24 Aug 2009

TL;DR: This study investigates the use of Cued Speech not only for perception, but also for speech production in the case of speech- or hearing-impaired individuals, and proposes an automatic recognition method based on hidden Markov model (HMM) automatic recognition.

...read moreread less

Abstract: This study focuses on alternative speech communication based on Cued Speech. Cued Speech is a visual mode of communication that uses handshapes and placements in combination with the mouth movements of speech to make the phonemes of a spoken language look different from each other and clearly understandable to deaf and hearing-impaired people. Originally, the aim of Cued Speech was to overcome the problems of lip reading and thus enable deaf children and adults to wholly understand spoken language. In this study, we investigate the use of Cued Speech not only for perception, but also for speech production in the case of speech- or hearing-impaired individuals. The proposed method is based on hidden Markov model (HMM) automatic recognition. Automatic recognition of Cued Speech can be served as an alternative speech communication method for individuals with speech- or hearing impairments. This article presents vowel and consonant, and also isolated word recognition experiments for Cued Speech for French. The results obtained are promising and comparable to the results obtained when using audio signal.

...read moreread less

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics