Topic
Viseme
About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.
Papers published on a yearly basis
Papers
More filters
01 Jan 1995
TL;DR: The phone duration percentile, a comparison of measured versus expected phone duration, is shown to be robust with respect to lexical content and consistent with previous findings about the statistics of long-term and short-term speech rate.
Abstract: This report describes a series of experiments that measure speech rate and that attempt to improve speech recognition accuracy for rapidly-spoken speech. Descriptions of several measures of speech rate are presented, with their advantages and disadvanatges. Speech recognition results obtained using several compensation methods are compared to identify methods by which compensation for the effects of fast speech may yield the greatest improvement in recognition accuracy. Very simple measures of speech rate such as the word rate or phone rate are found to be unsuitable for detection of both long-term and short-term speech rate since they are sensitive to the lexical content of speech. In contrast, the phone duration percentile, a comparison of measured versus expected phone duration, is shown to be robust with respect to lexical content and consistent with previous findings about the statistics of long-term and short-term speech rate. Using this metric, speakers with a speech rate in the top 30% are found to produce a 50 to 150% increase in word error rate. The compensation techniques explored contain modifications to five components of the recognition system: the models of the acoustical characteristics of speech sounds, the models of the HMM state-transition probabilities, the pronunciations of words in the dictionary, the weight with which acoustic and linguistic evidence are combined, and the base phone set. Optimizing the language weight reduced the word error rate of fast speech by 10.3% relative to baseline performance. Adapting the state-transition probabilities to fast speech reduced the word error rate for fast speech by 2.6%. Using one of the modified pronunciation dictionaries reduced the word error rate of fast speech by 2.6%. The other techniques yielded little or no reduction in the word error rate.
15 citations
••
TL;DR: A neural network recognition algorithm is developed by using the phonetic words decoding method and the requirement for isolated syllable pronunciation of voice commands to solve the phoneme recognition by facial expressions of a speaker in voice-activated control systems.
Abstract: The paper considers the phoneme recognition by facial expressions of a speaker in voice-activated control systems. We have developed a neural network recognition algorithm by using the phonetic words decoding method and the requirement for isolated syllable pronunciation of voice commands. The paper presents the experimental results of viseme (facial and lip position corresponding to a particular phoneme) classification of Russian vowels. We show the dependence of the classification accuracy on the used classifier (multilayer feed-forward network, support vector machine, k-nearest neighbor method), image features (histogram of oriented gradients, eigenvectors, SURF local descriptors) and the type of camera (built-in or Kinect one). The best accuracy of speaker-dependent recognition is shown to be 85% for a built-in camera and 96% for Kinect depth maps when the classification is performed with the histogram of oriented gradients and the support vector machine.
15 citations
••
TL;DR: Evidence is presented that both low-level grouping mechanisms and knowledge specific to speech are deployed in solving the problem of listeners' ability to separate speech from other sounds.
14 citations
••
TL;DR: The aim of this article was to introduce an important tool, cross-recurrence analysis, to speech production applications by showing how it can be adapted to evaluate the similarity of multi-step comparisons.
Abstract: Purpose The aim of this article was to introduce an important tool, cross-recurrence analysis, to speech production applications by showing how it can be adapted to evaluate the similarity of multi...
14 citations