Topic
Viseme
About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.
Papers published on a yearly basis
Papers
More filters
••
24 Sep 2020TL;DR: 3D talking head has the potential to be developed so that it can be understood by a deaf person and is able to produce animated movements that are easily understood by the deaf person.
Abstract: This research aims to build an android application that is able to help a deaf person in learning the Indonesian Sign Language System (SIBI). The application is packaged in the form of a 3D animated talking head. This research, the development of a talking head uses the dynamic viseme method. Dynamic viseme has the advantage of creating a more natural animation movement because it uses the original human model in the building process. By using the Dirichlet Free-Form Deformation (DFFD) method as control, the concatenative method is able to produce animated movements that are easily understood by the deaf person. In its build, linear interpolation was also used so that the displacement in the speaker becomes more natural. To build animated movements, four different human models were used. Experimental testing was carried out using surveys and calculations were carried out using the Mean Opinion Score (MOS) method. The number of surveys carried out was five. The purpose of the survey is to determine a model that is easier to understand by mouth movements. The test results on the Indonesian sentence show a value of 4.25. These results indicate that the 3D talking head has the potential to be developed so that it can be understood by a deaf person.
3 citations
••
01 Oct 2006TL;DR: This paper introduces the mouth shape sequence recognition techniques for continuous natural speech based on speech phoneme recognition, which consists of five major parts, namely, triphone modeling, search space organization, Viterbi searching, speech phonemic recognition, and path tracking back.
Abstract: As part of the effort to realize human face cartoon automatic synthesization, we explored the mouth shape sequence recognition techniques for continuous natural speech. In this paper we introduce such a method based on speech phoneme recognition. The method consists of five major parts, namely, triphone modeling, search space organization, Viterbi searching, speech phoneme recognition, and path tracking back. At the end of this paper, outputs of such a system are illustrated.
3 citations
••
26 Sep 2020TL;DR: A lip-reading method that can recognize by registering the words that you want to speak and that is optimized for the user using a small amount of data is examined, appropriate for embedding mobile devices in consideration of both usability and small vocabulary recognition accuracy.
Abstract: We have been developing a practical speech enhancement system that supports for laryngectomee. By interviewing users we captured essential issues, such as “utilization of existing device”, “the appearance needs to be inconspicuous”, and “the device should be easy to use”. Considering those user's needs, we plan to use smart phone platform and develop speech enhancement application so that the users are just ordinary looking, and there is no need to buy any additional device. In order to realize such system, the key concept of our proposed system performs lip-reading and speech synthesis. In this study, we examined a lip-reading method that can recognize by registering the words that you want to speak and that is optimized for the user using a small amount of data. 36 viseme images were converted into very small data using VAE(Variational Auto Encoder), then the training data for word recognition model was generated. Viseme is a group of phonemes with identical appearance on the lips. Our viseme sequence representation with VAE was used to be able to adapt users with very small amount of training data set. Word recognition experiment using VAE encoder and CNN was performed with 20 Japanese words. The experimental result showed 65% recognition accuracy, and 100% including 1st and 2nd candidates. The lip-reading type speech enhancement seems appropriate for embedding mobile devices in consideration of both usability and small vocabulary recognition accuracy.
3 citations
••
12 Oct 1999TL;DR: This research continues the development of the MARTI project with enhancement of facial modelling, and considers the application of psychological ideas to facial modelling and animation in order to create highly believable and life-like facial synthesis.
Abstract: Usability should be paramount in the development of any multimodal interface. Unfortunately, the software industry market their product upgrades based on the number of additional features rather than attempting to improve the user interaction. This tends to produce packages so complex that many of the facilities remain unknown to the user and invariably unused. Instead, by using a synthetic human face or cartoon-style characters, we attempt to make the interface more transparent and begin to readdress the complexity versus usability balance. This research continues the development of the MARTI project with enhancement of facial modelling. The study has considered key work in the field of speechreading and lip reading, and has extended the domain to develop a novel conversational, American English, viseme set. Furthermore, the work considers the application of psychological ideas to facial modelling and animation, in order to create highly believable and life-like facial synthesis. New levels of visual accuracy have been achieved for both human and cartoon character animation, to attain the highest performance to date for automated, speech to face, teeth, tongue, lips, and jaw articulation.
3 citations
•
TL;DR: The influence of speaker individuality is explained, and how one can use visemes to boost lipreading is demonstrated, which has applications beyond machine lipreading.
Abstract: For machines to lipread, or understand speech from lip movement, they decode lip-motions (known as visemes) into the spoken sounds. We investigate the visual speech channel to further our understanding of visemes. This has applications beyond machine lipreading; speech therapists, animators, and psychologists can benefit from this work. We explain the influence of speaker individuality, and demonstrate how one can use visemes to boost lipreading.
3 citations