Open AccessJournal Article
Comparing phonemes and visemes with DNN-based lipreading
Reads0
Chats0
TLDR
The phoneme lipreading system word accuracy outperforms the viseme based system word word accuracy, however, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.Abstract:
There is debate if phoneme or viseme units are the most effective for a lipreading
system. Some studies use phoneme units even though phonemes describe unique short
sounds; other studies tried to improve lipreading accuracy by focusing on visemes with
varying results. We compare the performance of a lipreading system by modeling visual
speech using either 13 viseme or 38 phoneme units. We report the accuracy of our
system at both word and unit levels. The evaluation task is large vocabulary continuous
speech using the TCD-TIMIT corpus. We complete our visual speech modeling via
hybrid DNN-HMMs and our visual speech decoder is aWeighted Finite-State Transducer
(WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The
phoneme lipreading system word accuracy outperforms the viseme based system word
accuracy. However, the phoneme system achieved lower accuracy at the unit level which
shows the importance of the dictionary for decoding classification outputs into words.read more
Citations
More filters
Journal ArticleDOI
Survey on automatic lip-reading in the era of deep learning
TL;DR: It is found that DL architectures perform similarly to traditional ones for simpler tasks but report significant improvements in more complex tasks, such as word or sentence recognition, with up to 40% improvement in word recognition rates.
Journal ArticleDOI
Lip Reading Sentences Using Deep Learning With Only Visual Cues
TL;DR: A neural network-based lip reading system designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training has achieved a significantly improved performance with 15% lower word error rate.
Journal ArticleDOI
A Survey of Research on Lipreading Technology
TL;DR: Typical deep learning methods on lipreading are analyzed in detail according to their structural characteristics, and existing lipreading databases are listed, including their detailed information and the methods applied to these databases.
Journal ArticleDOI
Deep Learning-Based Automated Lip-Reading: A Survey
TL;DR: A survey on automated lip-reading approaches is presented in this article with the main focus being on deep learning related methodologies which have proven to be more fruitful for both feature extraction and classification.
Proceedings ArticleDOI
Can DNNs Learn to Lipread Full Sentences
TL;DR: In this article, state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network (S2S RNN) were explored.
References
More filters
Journal ArticleDOI
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Journal ArticleDOI
Principal component analysis
TL;DR: Principal Component Analysis is a multivariate exploratory analysis method useful to separate systematic variation from noise and to define a space of reduced dimensions that preserve noise.
Proceedings Article
The Kaldi Speech Recognition Toolkit
Daniel Povey,Arnab Ghoshal,Gilles Boulianne,Lukas Burget,Ondrej Glembek,Nagendra Kumar Goel,Mirko Hannemann,Petr Motlicek,Yanmin Qian,Petr Schwarz,Jan Silovsky,Georg Stemmer,Karel Vesely +12 more
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Journal ArticleDOI
An introduction to hidden Markov models
TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.