Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Real-Time Multimodal Human–Avatar Interaction

[...]

Yun Fu¹, Renxiang Li², T.S. Huang¹, M. Danielsen¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Motorola²

01 Apr 2008-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A novel real-time multimodal human-avatar interaction (RTM-HAI) framework with vision-based remote animation control (RAC) that integrates audio-visual analysis and synthesis modules to realize multichannel and runtime animations, visual TTS and real- time viseme detection and rendering.

...read moreread less

Abstract: This paper presents a novel real-time multimodal human-avatar interaction (RTM-HAI) framework with vision-based remote animation control (RAC). The framework is designed for both mobile and desktop avatar-based human-machine or human-human visual communications in real-world scenarios. Using 3-D components stored in the Java mobile 3-D (M3G) file format, the avatar models can be flexibly constructed and customized on the fly on any mobile devices or systems that support the M3G standard. For the RAC head tracker, we propose a 2-D real-time face detection/tracking strategy through an interactive loop, in which the detection and tracking complement each other for efficient and reliable face localization, tolerating extreme user movement. With the face location robustly tracked, the RAC head tracker selects a main user and estimates the user's head rolling, tilting, yawing, scaling, horizontal, and vertical motion in order to generate avatar animation parameters. The animation parameters can be used either locally or remotely and can be transmitted through socket over the network. In addition, it integrates audio-visual analysis and synthesis modules to realize multichannel and runtime animations, visual TTS and real-time viseme detection and rendering. The framework is recognized as an effective design for future realistic industrial products of humanoid kiosk and human-to-human mobile communication.

...read moreread less

70 citations

Patent•

Prosthetic hearing device that transforms a detected speech into a speech of a speech form assistive in understanding the semantic meaning in the detected speech

[...]

Toshihiko Oba

16 Feb 2000

TL;DR: A speech transformation apparatus as discussed by the authors consists of a microphone for detecting speech and generating a speech signal, a signal processor for performing a speech recognition process using the speech signal; a speech information generator for transforming the recognition result responsive to the physical state of the user, the operating conditions, and/or the purpose for using the apparatus; and a display unit 26 and loudspeaker 25 for generating a control signal for outputting a raw recognition result and or a transformed recognition result.

...read moreread less

Abstract: A speech transformation apparatus comprises a microphone 21 for detecting speech and generating a speech signal; a signal processor 22 for performing a speech recognition process using the speech signal; a speech information generator for transforming the recognition result responsive to the physical state of the user, the operating conditions, and/or the purpose for using the apparatus; and a display unit 26 and loudspeaker 25 for generating a control signal for outputting a raw recognition result and/or a transformed recognition result. In a speech transformation apparatus thus constituted, speech enunciated by a spoken-language-impaired individual can be transformed and presented to the user, and sounds from outside sources can also be transformed and presented to the user.

...read moreread less

69 citations

Proceedings Article•DOI•

Principal components of expressive speech animation

[...]

S. Kshirsagar¹, T. Molet, Nadia Magnenat-Thalmann•Institutions (1)

Geneva College¹

03 Jul 2001

TL;DR: A new technique for expressive and realistic speech animation that uses an optical tracking system that extracts the 3D positions of markers attached at the feature point locations to capture the movements of the face of a talking person and forms a vector space representation that offers insight into improving realism of animated faces.

...read moreread less

Abstract: We describe a new technique for expressive and realistic speech animation. We use an optical tracking system that extracts the 3D positions of markers attached at the feature point locations to capture the movements of the face of a talking person. We use the feature points as defined by the MPEG-4 standard. We then form a vector space representation by using the principal component analysis of this data. We call this space "expression and viseme space". Such a representation not only offers insight into improving realism of animated faces, but also gives a new way of generating convincing speech animation and blending between several expressions. As the rigid body movements and deformation constraints on the facial movements have been considered through this analysis, the resulting facial animation is very realistic.

...read moreread less

69 citations

Journal Article•DOI•

Adaptive fusion of acoustic and visual sources for automatic speech recognition

[...]

Alexandrina Rogozan, Paul Deléglise

01 Oct 1998-Speech Communication

TL;DR: In this article, a method for adaptive integration of acoustic and visual information in ASR is presented. But, it is not shown that using adaptive modality weights instead of fixed weights allows for performance improvement and that weight estimation could benefit from using visemes as decision units for the visual recogniser.

...read moreread less

68 citations

Patent•

Hybrid Speech Recognition

[...]

Detlef Koll

31 Aug 2009

TL;DR: In this paper, a hybrid speech recognition system uses a client-side speech recognition engine and a server-side SPR engine to produce speech recognition results for the same speech for arbitration.

...read moreread less

Abstract: A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.

...read moreread less

67 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics