Comparing phonemes and visemes with DNN-based lipreading

Open AccessJournal Article

Comparing phonemes and visemes with DNN-based lipreading

Kwanchiva Thangthai, +2 more

- 01 Sep 2017 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

The phoneme lipreading system word accuracy outperforms the viseme based system word word accuracy, however, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.

Abstract:

There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is aWeighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Survey on automatic lip-reading in the era of deep learning

Adriana Fernandez-Lopez, +1 more

- 01 Oct 2018 -

Image and Vision Computing

TL;DR: It is found that DL architectures perform similarly to traditional ones for simpler tasks but report significant improvements in more complex tasks, such as word or sentence recognition, with up to 40% improvement in word recognition rates.

...read moreread less

Journal ArticleDOI

Lip Reading Sentences Using Deep Learning With Only Visual Cues

Souheil Fenghour, +3 more

- 26 Nov 2020 -

IEEE Access

TL;DR: A neural network-based lip reading system designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training has achieved a significantly improved performance with 15% lower word error rate.

...read moreread less

Journal ArticleDOI

A Survey of Research on Lipreading Technology

Mingfeng Hao, +4 more

- 09 Nov 2020 -

IEEE Access

TL;DR: Typical deep learning methods on lipreading are analyzed in detail according to their structural characteristics, and existing lipreading databases are listed, including their detailed information and the methods applied to these databases.

...read moreread less

Journal ArticleDOI

Deep Learning-Based Automated Lip-Reading: A Survey

Souheil Fenghour, +4 more

- 25 Aug 2021 -

IEEE Access

TL;DR: A survey on automated lip-reading approaches is presented in this article with the main focus being on deep learning related methodologies which have proven to be more fruitful for both feature extraction and classification.

...read moreread less

Proceedings ArticleDOI

Can DNNs Learn to Lipread Full Sentences

George Sterpu, +2 more

TL;DR: In this article, state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network (S2S RNN) were explored.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

The use of multiple measurements in taxonomic problems

R. A. Fisher

- 01 Sep 1936 -

Annals of Human Genetics

Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton, +10 more

- 18 Oct 2012 -

IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Journal ArticleDOI

Principal component analysis

Svante Wold, +4 more

- 01 Aug 1987 -

Chemometrics and Intelligent Laboratory ...

TL;DR: Principal Component Analysis is a multivariate exploratory analysis method useful to separate systematic variation from noise and to define a space of reduced dimensions that preserve noise.

...read moreread less

Proceedings Article

The Kaldi Speech Recognition Toolkit

Daniel Povey, +12 more

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.

...read moreread less

Journal ArticleDOI

An introduction to hidden Markov models

Lawrence R. Rabiner, +1 more

- 01 Jan 1986 -

IEEE Assp Magazine

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less