An improved automatic lipreading system to enhance speech recognition

doi:10.1145/57167.57170

Proceedings ArticleDOI

An improved automatic lipreading system to enhance speech recognition

E. Petajan, +3 more

- pp 19-25

Chats0

TLDR

An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure to improve acoustic speech recognition.

Abstract:

Current acoustic speech recognition technology performs well with very small vocabularies in noise or with large vocabularies in very low noise. Accurate acoustic speech recognition in noise with vocabularies over 100 words has yet to be achieved. Humans frequently lipread the visible facial speech articulations to enhance speech recognition, especially when the acoustic signal is degraded by noise or hearing impairment. Automatic lipreading has been found to improve significantly acoustic speech recognition and could be advantageous in noisy environments such as offices, aircraft and factories.An improved version of a previously described automatic lipreading system has been developed which uses vector quantization, dynamic time warping, and a new heuristic distance measure. This paper presents visual speech recognition results from multiple speakers under optimal conditions. Results from combined acoustic and visual speech recognition are also presented which show significantly improved performance compared to the acoustic recognition system alone.

Citations

PDF

Open Access

More filters

Book

Survey of the State of the Art in Human Language Technology

R. Cole

TL;DR: In this article, the authors present a glossary for language analysis and understanding in the context of spoken language input and output technologies, and evaluate their work with a set of annotated corpora.

...read moreread less

Journal ArticleDOI

Extraction of visual features for lipreading

Iain Matthews, +4 more

- 01 Feb 2002 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared and two are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively.

...read moreread less

Journal ArticleDOI

Motion-based recognition a survey

Claudette Cédras, +1 more

- 01 Mar 1995 -

Image and Vision Computing

TL;DR: A review of recent developments in the computer vision aspect of motionbased recognition and several methods for the recognition of objects and motions, including cyclic motion detection and recognition, lipreading, hand gestures interpretation, motion verb recognition and temporal textures classification are reported.

...read moreread less

Proceedings ArticleDOI

"Eigenlips" for robust speech recognition

Christoph Bregler, +1 more

TL;DR: This study improves the performance of a hybrid connectionist speech recognition system by incorporating visual information about the corresponding lip movements by using a new visual front end, and an alternative architecture for combining the visual and acoustic information.

...read moreread less

Proceedings ArticleDOI

CUAVE: A new audio-visual database for multimodal human-computer interface research

Eric Patterson, +3 more

TL;DR: A new audiovisual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD, and the inclusion of pairs of simultaneous speakers, the first documented database of this kind are introduced.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Dynamic programming algorithm optimization for spoken word recognition

H. Sakoe, +1 more

- 01 Feb 1978 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.

...read moreread less

Automatic lipreading to enhance speech recognition (speech reading)

Eric David Petajan

TL;DR: An automatic lipreading system which has been developed and the combination of the acoustic and visual recognition candidates is shown to yield a final recognition accuracy which greatly exceeds the acoustic recognition accuracy alone.

...read moreread less

Journal ArticleDOI

Vector quantization: A pattern-matching technique for speech coding

Allen Gersho, +1 more

- 01 Dec 1983 -

IEEE Communications Magazine

TL;DR: Recent results obtained in waveform coding of speech with vector quantization are reviewed, with Vector quantization appearing to be a suitable coding technique which caters to this dual requirement of effective speech coding.

...read moreread less

Journal ArticleDOI

Coding of Two-Tone Images

Thomas S. Huang

- 01 Jan 1977 -

IEEE Transactions on Communications

TL;DR: The concepts and techniques of efficient coding for the transmission or storage of two-tone images, such as business documents and weather maps, are reviewed.

...read moreread less

Proceedings ArticleDOI

Coding Of Two-Tone Images

Thomas S. Huang

TL;DR: This work gives a brief overview of efficient coding methods for two-tone images, especially: white block skipping and runlength coding.

...read moreread less

An improved automatic lipreading system to enhance speech recognition

Citations

Survey of the State of the Art in Human Language Technology

Extraction of visual features for lipreading

Motion-based recognition a survey

"Eigenlips" for robust speech recognition

CUAVE: A new audio-visual database for multimodal human-computer interface research

References

Dynamic programming algorithm optimization for spoken word recognition

Automatic lipreading to enhance speech recognition (speech reading)

Vector quantization: A pattern-matching technique for speech coding

Coding of Two-Tone Images

Coding Of Two-Tone Images

Related Papers (5)

Hearing lips and seeing voices

Automatic lipreading to enhance speech recognition (speech reading)

"Eigenlips" for robust speech recognition

Snakes : Active Contour Models

Fundamentals of speech recognition