Proceedings ArticleDOI
Visual speech recognition with loosely synchronized feature streams
Kate Saenko,Karen Livescu,Michael R. Siracusa,Kevin W. Wilson,James Glass,Trevor Darrell +5 more
- Vol. 2, pp 1424-1431
Reads0
Chats0
TLDR
A novel dynamic Bayesian network with a multi-stream structure and observations consisting of articulate feature classifier scores, which can model varying degrees of co-articulation in a principled way is presented.Abstract:
We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that first employs discriminative detection of visual speech and articulate features, and then performs recognition using a model that accounts for the loose synchronization of the feature streams. Discriminative classifiers detect the subclass of lip appearance corresponding to the presence of speech, and further decompose it into features corresponding to the physical components of articulate production. These components often evolve in a semi-independent fashion, and conventional viseme-based approaches to recognition fail to capture the resulting co-articulation effects. We present a novel dynamic Bayesian network with a multi-stream structure and observations consisting of articulate feature classifier scores, which can model varying degrees of co-articulation in a principled way. We evaluate our visual-only recognition system on a command utterance task. We show comparative results on lip detection and speech/non-speech classification, as well as recognition performance against several baseline systemsread more
Citations
More filters
Proceedings ArticleDOI
"Hello! My name is... Buffy" - Automatic Naming of Characters in TV Video
TL;DR: It is demonstrated that high precision can be achieved by combining multiple sources of information, both visual and textual, by automatic generation of time stamped character annotation by aligning subtitles and transcripts.
Proceedings ArticleDOI
Hidden Conditional Random Fields for Gesture Recognition
TL;DR: This paper derives a discriminative sequence model with a hidden state structure, and demonstrates its utility both in a detection and in a multi-way classification formulation.
Journal ArticleDOI
Lipreading With Local Spatiotemporal Descriptors
TL;DR: Local spatiotemporal descriptors are presented to represent and recognize spoken isolated phrases based solely on visual input to include local processing and robustness to monotonic gray-scale changes.
Journal ArticleDOI
Taking the bite out of automated naming of characters in TV video
TL;DR: It is demonstrated that high precision can be achieved by combining multiple sources of information, both visual and textual, by automatic generation of time stamped character annotation by aligning subtitles and transcripts.
Journal ArticleDOI
Speech production knowledge in automatic speech recognition.
TL;DR: A survey of a growing body of work in which representations of speech production are used to improve automatic speech recognition is provided.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Dynamic bayesian networks: representation, inference and learning
Kevin Murphy,Stuart Russell +1 more
TL;DR: This thesis will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in Dbns, and how to learn DBN models from sequential data.
Journal ArticleDOI
Factorial Hidden Markov Models
TL;DR: A generalization of HMMs in which this state is factored into multiple state variables and is therefore represented in a distributed manner, and a structured approximation in which the the state variables are decoupled, yielding a tractable algorithm for learning the parameters of the model.
Related Papers (5)
Rapid object detection using a boosted cascade of simple features
Paul A. Viola,Michael Jones +1 more