scispace - formally typeset
Open AccessJournal ArticleDOI

A support vector machine-based dynamic network for visual speech recognition applications

TLDR
This paper examines the suitability of support vector machines for visual speech recognition by modeling the temporal character of speech as a temporal sequence of visemes corresponding to the different phones realized in a Viterbi lattice.
Abstract
Visual speech recognition is an emerging research field. In this paper, we examine the suitability of support vector machines for visual speech recognition. Each word is modeled as a temporal sequence of visemes corresponding to the different phones realized. One support vector machine is trained to recognize each viseme and its output is converted to a posterior probability through a sigmoidal mapping. To model the temporal character of speech, the support vector machines are integrated as nodes into a Viterbi lattice. We test the performance of the proposed approach on a small visual speech recognition task, namely the recognition of the first four digits in English. The word recognition rate obtained is at the level of the previous best reported rates.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Recent advances in the automatic recognition of audiovisual speech

TL;DR: The main components of audiovisual automatic speech recognition (ASR) are reviewed and novel contributions in two main areas are presented: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovISual speech integration.
Journal ArticleDOI

Audio-Visual Biometrics

TL;DR: The main components of audio-visual biometric systems are described, existing systems and their performance are reviewed, and future research and development directions in this area are discussed.
Proceedings ArticleDOI

Visual speech recognition with loosely synchronized feature streams

TL;DR: A novel dynamic Bayesian network with a multi-stream structure and observations consisting of articulate feature classifier scores, which can model varying degrees of co-articulation in a principled way is presented.

Joint audio-visual speech processing for recognition and enhancement.

TL;DR: Two general approaches that utilize visual speech to improve ASR in acoustically challenging environments are reviewed: One directly combines features extracted from the acoustic and visual channels, aiming at superior recognition performance of the resulting audio-visual ASR system and the other seeks to eliminate the noise present in the acoustic features, resulting in improved speech recognition.
Proceedings ArticleDOI

Articulatory features for robust visual speech recognition

TL;DR: A novel approach to visual speech modeling, based on articulatory features, which has potential benefits under visually challenging conditions, and is evaluated in a preliminary experiment on a small audio-visual database.
References
More filters

Statistical learning theory

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Journal ArticleDOI

A Tutorial on Support Vector Machines for Pattern Recognition

TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.
Book

Probability, random variables and stochastic processes

TL;DR: This chapter discusses the concept of a Random Variable, the meaning of Probability, and the axioms of probability in terms of Markov Chains and Queueing Theory.
Book

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.