scispace - formally typeset
Open AccessJournal Article

Comparing phonemes and visemes with DNN-based lipreading

Reads0
Chats0
TLDR
The phoneme lipreading system word accuracy outperforms the viseme based system word word accuracy, however, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.
Abstract
There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is aWeighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.

read more

Citations
More filters
Journal ArticleDOI

Survey on automatic lip-reading in the era of deep learning

TL;DR: It is found that DL architectures perform similarly to traditional ones for simpler tasks but report significant improvements in more complex tasks, such as word or sentence recognition, with up to 40% improvement in word recognition rates.
Journal ArticleDOI

Lip Reading Sentences Using Deep Learning With Only Visual Cues

TL;DR: A neural network-based lip reading system designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training has achieved a significantly improved performance with 15% lower word error rate.
Journal ArticleDOI

A Survey of Research on Lipreading Technology

TL;DR: Typical deep learning methods on lipreading are analyzed in detail according to their structural characteristics, and existing lipreading databases are listed, including their detailed information and the methods applied to these databases.
Journal ArticleDOI

Deep Learning-Based Automated Lip-Reading: A Survey

TL;DR: A survey on automated lip-reading approaches is presented in this article with the main focus being on deep learning related methodologies which have proven to be more fruitful for both feature extraction and classification.
Proceedings ArticleDOI

Can DNNs Learn to Lipread Full Sentences

TL;DR: In this article, state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network (S2S RNN) were explored.
References
More filters
Journal ArticleDOI

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Journal ArticleDOI

Principal component analysis

TL;DR: Principal Component Analysis is a multivariate exploratory analysis method useful to separate systematic variation from noise and to define a space of reduced dimensions that preserve noise.
Proceedings Article

The Kaldi Speech Recognition Toolkit

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Journal ArticleDOI

An introduction to hidden Markov models

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.
Related Papers (5)