scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Patent
08 Feb 1996
TL;DR: In this article, a method and device for deciding quality of speech is presented, where the speech to be evaluated is listened in to by a person who reproduces the speech and the stops of vowel sounds in he produced and reproduced speech respectively are appointed.
Abstract: The present invention refers to a method and device for deciding quality of speech. The speech to be evaluated is listened in to by a person who reproduces the speech. Stops of vowel sounds in he produced and reproduced speech respectively are appointed. The difference between the stops of the vowel sounds is registered. Out of the obtained differences an average value is created. The achieved average value indicates the quality of the produced speech. The invention can be used for evaluation of different speech producing sources such as equipments and/or machines and people's ability to comprehend the speech.

5 citations

Book
08 Jun 2012

5 citations

Proceedings ArticleDOI
25 Aug 2017
TL;DR: In this paper, the authors compare the performance of the most widely used features for lipreading, Discrete Cosine Transform (DCT) and Active Appearance Models (AAM), in a traditional Hidden Markov Model (HMM) framework.
Abstract: Automatic lipreading has major potential impact for speech recognition, supplementing and complementing the acoustic modality. Most attempts at lipreading have been performed on small vocabulary tasks, due to a shortfall of appropriate audio-visual datasets. In this work we use the publicly available TCD-TIMIT database, designed for large vocabulary continuous audio-visual speech recognition. We compare the viseme recognition performance of the most widely used features for lipreading, Discrete Cosine Transform (DCT) and Active Appearance Models (AAM), in a traditional Hidden Markov Model (HMM) framework. We also exploit recent advances in AAM fitting. We found the DCT to outperform AAM by more than 6% for a viseme recognition task with 56 speakers. The overall accuracy of the DCT is quite low (32-34%). We conclude that a fundamental rethink of the modelling of visual features may be needed for this task.

5 citations

19 Dec 2015
TL;DR: A feature-fusion audio-visual speech recognition system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model is described.
Abstract: Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.

5 citations

Patent
24 Jul 2009
TL;DR: In this article, the authors identify speech sound features using a multi-dimensional analysis that analyzes the time, frequency, and intensity at which a feature occurs within a speech sound, and the contribution of the feature to the sound.
Abstract: Methods and systems of identifying speech sound features within a speech sound are provided. The sound features may be identified using a multi-dimensional analysis that analyzes the time, frequency, and intensity at which a feature occurs within a speech sound, and the contribution of the feature to the sound. Information about sound features may be used to enhance spoken speech sounds to improve recognizability of the speech sounds by a listener.

5 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822