scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, various aspects of the mechanisms of speech are studied, such as the perception of speech, the sounds of speech and word recognition, and articulation, the pronunciation of words and the sound of speech.
Abstract: Various aspects of the mechanisms of speech are studied. One series of studies has concentrated on the perception of speech, the sounds of speech, and word recognition. Various models for speech recognition have been created. Another set of studies has focused on articulation, the pronunciation of words and the sounds of speech. This area has also been explored in considerable detail.
Patent
10 Jun 2021
TL;DR: In this paper, a method for generating a head model animation from a voice signal using an artificial intelligence model; and an electronic device for implementing same, is presented, which comprises the steps of: acquiring characteristics information of a voice signals from the voice signal; by using the artificial intelligence models, acquiring, from the characteristics information, a phoneme stream corresponding to the voice signals, and a viseme stream correspond to the phoneme streams; and generating a Head Model animation by applying the animation curve to the visemes of the merged phoneme and viseme streams.
Abstract: Disclosed are: a method for generating a head model animation from a voice signal using an artificial intelligence model; and an electronic device for implementing same. The disclosed method for generating a head model animation from a voice signal, carried out by the electronic device, comprises the steps of: acquiring characteristics information of a voice signal from the voice signal; by using the artificial intelligence model, acquiring, from the characteristics information, a phoneme stream corresponding to the voice signal, and a viseme stream corresponding to the phoneme stream; by using the artificial intelligence model, acquiring an animation curve of visemes included in the viseme stream; merging the phoneme stream with the viseme stream; and generating a head model animation by applying the animation curve to the visemes of the merged phoneme and viseme stream.
Journal Article
TL;DR: This paper proposes a new strategy of speech synthesis that uses intermediate-sized units corresponding to half syllables, called `demisyllables' in order to produce computer-generated speech.
Abstract: Synthesis of English speech by computer can be accomplished in several different ways, depending on the size of the speech units that are used to produce voice output. The most widely used units for speech synthesis are phonemes (i.e., small speech units corresponding to individual phonetic items). An alternate method of producing computer-generated speech is to concatenate entire words of English in a method called `word-concatenation' synthesis. A third strategy, the one described in this paper, is to use intermediate-sized units corresponding to half syllables, called `demisyllables'
Book ChapterDOI
01 Jan 2020
TL;DR: This paper presents a system that recognizes the lip movement for lip-reading system using Viola–Jones algorithm and DCT to extract the mouth features.
Abstract: This paper presents a system that recognizes the lip movement for lip-reading system. Four lip gestures are recognized: rounded open, wide open, small open and closed. These gestures are used to describe visually the speech. Firstly, we detect the mouth region from frame using Viola–Jones algorithm. Then, we use DCT to extract the mouth features. The recognition is performed by a HMM which achieves a high performance of 84.99%.
Book ChapterDOI
01 Jan 2004
TL;DR: This work describes a maximum a posteriori decoding strategy for feature-based recognizers and derive two normalization critera useful for a segment-based Viterbi or A* search.
Abstract: Most speech recognizers use an observation space which is based on a temporal sequence of spectral “frames.” There is another class of recognizer which further processes these frames to produce a segment-based network, and represents each segment by a fixed-dimensional “feature.” In such feature-based recognizers the observation space takes the form of a temporal graph of feature vectors, so that any single segmentation of an utterance will use a subset of all possible feature vectors. In this work we describe a maximum a posteriori decoding strategy for feature-based recognizers and derive two normalization critera useful for a segment-based Viterbi or A* search. We show how a segment-based recognizer is able to obtain good results on the tasks of phonetic and word recognition.

Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822