scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
01 Jan 1995
TL;DR: The phone duration percentile, a comparison of measured versus expected phone duration, is shown to be robust with respect to lexical content and consistent with previous findings about the statistics of long-term and short-term speech rate.
Abstract: This report describes a series of experiments that measure speech rate and that attempt to improve speech recognition accuracy for rapidly-spoken speech. Descriptions of several measures of speech rate are presented, with their advantages and disadvanatges. Speech recognition results obtained using several compensation methods are compared to identify methods by which compensation for the effects of fast speech may yield the greatest improvement in recognition accuracy. Very simple measures of speech rate such as the word rate or phone rate are found to be unsuitable for detection of both long-term and short-term speech rate since they are sensitive to the lexical content of speech. In contrast, the phone duration percentile, a comparison of measured versus expected phone duration, is shown to be robust with respect to lexical content and consistent with previous findings about the statistics of long-term and short-term speech rate. Using this metric, speakers with a speech rate in the top 30% are found to produce a 50 to 150% increase in word error rate. The compensation techniques explored contain modifications to five components of the recognition system: the models of the acoustical characteristics of speech sounds, the models of the HMM state-transition probabilities, the pronunciations of words in the dictionary, the weight with which acoustic and linguistic evidence are combined, and the base phone set. Optimizing the language weight reduced the word error rate of fast speech by 10.3% relative to baseline performance. Adapting the state-transition probabilities to fast speech reduced the word error rate for fast speech by 2.6%. Using one of the modified pronunciation dictionaries reduced the word error rate of fast speech by 2.6%. The other techniques yielded little or no reduction in the word error rate.

15 citations

Journal ArticleDOI
TL;DR: A neural network recognition algorithm is developed by using the phonetic words decoding method and the requirement for isolated syllable pronunciation of voice commands to solve the phoneme recognition by facial expressions of a speaker in voice-activated control systems.
Abstract: The paper considers the phoneme recognition by facial expressions of a speaker in voice-activated control systems. We have developed a neural network recognition algorithm by using the phonetic words decoding method and the requirement for isolated syllable pronunciation of voice commands. The paper presents the experimental results of viseme (facial and lip position corresponding to a particular phoneme) classification of Russian vowels. We show the dependence of the classification accuracy on the used classifier (multilayer feed-forward network, support vector machine, k-nearest neighbor method), image features (histogram of oriented gradients, eigenvectors, SURF local descriptors) and the type of camera (built-in or Kinect one). The best accuracy of speaker-dependent recognition is shown to be 85% for a built-in camera and 96% for Kinect depth maps when the classification is performed with the histogram of oriented gradients and the support vector machine.

15 citations

Journal ArticleDOI
TL;DR: Evidence is presented that both low-level grouping mechanisms and knowledge specific to speech are deployed in solving the problem of listeners' ability to separate speech from other sounds.

14 citations

Journal ArticleDOI
TL;DR: The aim of this article was to introduce an important tool, cross-recurrence analysis, to speech production applications by showing how it can be adapted to evaluate the similarity of multi-step comparisons.
Abstract: Purpose The aim of this article was to introduce an important tool, cross-recurrence analysis, to speech production applications by showing how it can be adapted to evaluate the similarity of multi...

14 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822