scispace - formally typeset
Proceedings ArticleDOI

An improved automatic lipreading system to enhance speech recognition

Reads0
Chats0
TLDR
An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure to improve acoustic speech recognition.
Abstract
Current acoustic speech recognition technology performs well with very small vocabularies in noise or with large vocabularies in very low noise. Accurate acoustic speech recognition in noise with vocabularies over 100 words has yet to be achieved. Humans frequently lipread the visible facial speech articulations to enhance speech recognition, especially when the acoustic signal is degraded by noise or hearing impairment. Automatic lipreading has been found to improve significantly acoustic speech recognition and could be advantageous in noisy environments such as offices, aircraft and factories.An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure. This paper presents visual speech recognition results from multiple speakers under optimal conditions. Results from combined acoustic and visual speech recognition are also presented which show significantly improved performance compared to the acoustic recognition system alone.

read more

Citations
More filters
Book

Survey of the State of the Art in Human Language Technology

R. Cole
TL;DR: In this article, the authors present a glossary for language analysis and understanding in the context of spoken language input and output technologies, and evaluate their work with a set of annotated corpora.
Journal ArticleDOI

Extraction of visual features for lipreading

TL;DR: Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared and two are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively.
Journal ArticleDOI

Motion-based recognition a survey

TL;DR: A review of recent developments in the computer vision aspect of motionbased recognition and several methods for the recognition of objects and motions, including cyclic motion detection and recognition, lipreading, hand gestures interpretation, motion verb recognition and temporal textures classification are reported.
Proceedings ArticleDOI

"Eigenlips" for robust speech recognition

TL;DR: This study improves the performance of a hybrid connectionist speech recognition system by incorporating visual information about the corresponding lip movements by using a new visual front end, and an alternative architecture for combining the visual and acoustic information.
Proceedings ArticleDOI

CUAVE: A new audio-visual database for multimodal human-computer interface research

TL;DR: A new audiovisual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD, and the inclusion of pairs of simultaneous speakers, the first documented database of this kind are introduced.
References
More filters
Journal ArticleDOI

Dynamic programming algorithm optimization for spoken word recognition

TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.

Automatic lipreading to enhance speech recognition (speech reading)

TL;DR: An automatic lipreading system which has been developed and the combination of the acoustic and visual recognition candidates is shown to yield a final recognition accuracy which greatly exceeds the acoustic recognition accuracy alone.
Journal ArticleDOI

Vector quantization: A pattern-matching technique for speech coding

TL;DR: Recent results obtained in waveform coding of speech with vector quantization are reviewed, with Vector quantization appearing to be a suitable coding technique which caters to this dual requirement of effective speech coding.
Journal ArticleDOI

Coding of Two-Tone Images

TL;DR: The concepts and techniques of efficient coding for the transmission or storage of two-tone images, such as business documents and weather maps, are reviewed.
Proceedings ArticleDOI

Coding Of Two-Tone Images

TL;DR: This work gives a brief overview of efficient coding methods for two-tone images, especially: white block skipping and runlength coding.