scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel real-time multimodal human-avatar interaction (RTM-HAI) framework with vision-based remote animation control (RAC) that integrates audio-visual analysis and synthesis modules to realize multichannel and runtime animations, visual TTS and real- time viseme detection and rendering.
Abstract: This paper presents a novel real-time multimodal human-avatar interaction (RTM-HAI) framework with vision-based remote animation control (RAC). The framework is designed for both mobile and desktop avatar-based human-machine or human-human visual communications in real-world scenarios. Using 3-D components stored in the Java mobile 3-D (M3G) file format, the avatar models can be flexibly constructed and customized on the fly on any mobile devices or systems that support the M3G standard. For the RAC head tracker, we propose a 2-D real-time face detection/tracking strategy through an interactive loop, in which the detection and tracking complement each other for efficient and reliable face localization, tolerating extreme user movement. With the face location robustly tracked, the RAC head tracker selects a main user and estimates the user's head rolling, tilting, yawing, scaling, horizontal, and vertical motion in order to generate avatar animation parameters. The animation parameters can be used either locally or remotely and can be transmitted through socket over the network. In addition, it integrates audio-visual analysis and synthesis modules to realize multichannel and runtime animations, visual TTS and real-time viseme detection and rendering. The framework is recognized as an effective design for future realistic industrial products of humanoid kiosk and human-to-human mobile communication.

70 citations

Patent
16 Feb 2000
TL;DR: A speech transformation apparatus as discussed by the authors consists of a microphone for detecting speech and generating a speech signal, a signal processor for performing a speech recognition process using the speech signal; a speech information generator for transforming the recognition result responsive to the physical state of the user, the operating conditions, and/or the purpose for using the apparatus; and a display unit 26 and loudspeaker 25 for generating a control signal for outputting a raw recognition result and or a transformed recognition result.
Abstract: A speech transformation apparatus comprises a microphone 21 for detecting speech and generating a speech signal; a signal processor 22 for performing a speech recognition process using the speech signal; a speech information generator for transforming the recognition result responsive to the physical state of the user, the operating conditions, and/or the purpose for using the apparatus; and a display unit 26 and loudspeaker 25 for generating a control signal for outputting a raw recognition result and/or a transformed recognition result. In a speech transformation apparatus thus constituted, speech enunciated by a spoken-language-impaired individual can be transformed and presented to the user, and sounds from outside sources can also be transformed and presented to the user.

69 citations

Proceedings ArticleDOI
03 Jul 2001
TL;DR: A new technique for expressive and realistic speech animation that uses an optical tracking system that extracts the 3D positions of markers attached at the feature point locations to capture the movements of the face of a talking person and forms a vector space representation that offers insight into improving realism of animated faces.
Abstract: We describe a new technique for expressive and realistic speech animation. We use an optical tracking system that extracts the 3D positions of markers attached at the feature point locations to capture the movements of the face of a talking person. We use the feature points as defined by the MPEG-4 standard. We then form a vector space representation by using the principal component analysis of this data. We call this space "expression and viseme space". Such a representation not only offers insight into improving realism of animated faces, but also gives a new way of generating convincing speech animation and blending between several expressions. As the rigid body movements and deformation constraints on the facial movements have been considered through this analysis, the resulting facial animation is very realistic.

69 citations

Journal ArticleDOI
TL;DR: In this article, a method for adaptive integration of acoustic and visual information in ASR is presented. But, it is not shown that using adaptive modality weights instead of fixed weights allows for performance improvement and that weight estimation could benefit from using visemes as decision units for the visual recogniser.

68 citations

Patent
31 Aug 2009
TL;DR: In this paper, a hybrid speech recognition system uses a client-side speech recognition engine and a server-side SPR engine to produce speech recognition results for the same speech for arbitration.
Abstract: A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.

67 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822