scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Book ChapterDOI
01 Jan 2009
TL;DR: Early speech perception studies sought to determine which speech sound contrasts infants could detect as discussed by the authors, and found that infants can discriminate a wide range of speech sounds, and by 12-months, infants categorically perceive speech sounds; segment units from the speech stream; learn about legal sound combinations, rhythm, and stress; and track statistical properties of the speech input.
Abstract: Speech perception proceeds by extracting acoustic cues and mapping them onto linguistic information. Early speech perception studies sought to determine which speech sound contrasts infants could detect. Over the past few decades, research has shown that young infants can discriminate a wide range of speech sounds, and by 12 months, infants categorically perceive speech sounds; segment units from the speech stream; learn about legal sound combinations, rhythm, and stress; and track statistical properties of the speech input. Infants then use this knowledge to begin extracting and learning words. This article reviews infant speech abilities over the first 2 years of life, discusses theoretical accounts, and outlines some challenges.

2 citations

Book
01 Jan 1999
TL;DR: The present work focuses on Speech and Voice Perception, Speech Production and Perception Models and their Applications to Synthesis, Recognition, and Coding.
Abstract: Section 1 - Fundamentals of Speech Analysis and Perceptron.- Articulatory Constraints on Distinctive Features.- "Herr Muller vivra a Taranto con i suoi colleghi austriaci" Investigations on a fragment of Italian Phonology.- Acoustic Analysis and Perception of Classes of Sounds (vowels and consonants).- Speech and Voice Perception: Beyond Pattern Recognition.- Section 2 - Speech Processing.- Analysis in Automatic Recognition of Speech.- Speech Production and Perception Models and their Applications to Synthesis, Recognition, and Coding.- Section 3 - Stochastic Models for Speech.- Statistical Methods for Automatic Speech Recognition.- Statistical Modelling: from Speech Recognition to Text Translation.- Continuous Speech Recognition with Neural Networks: An Application to Railway Timetables Enquires.- Multi-Level Multi-Decision Model for Automatic Speech Recognition and Understanding.- Generative Models for Automatic Speech Recognition, Understanding and Synthesis.- Speech Modelling Virtual Laboratory.- Section 4 - Auditory and Neural Network Models for Speech.- Auditory Modeling and Neural Networks.- Neural Networks for Automatic Speech Recognition: a Review.- Preprocessing and Classification of English Stops Nasals and Fricatives.- Self-Organizing Feature Maps for Arabic Phonemes.- Section 5 - Task-Oriented Applications of Automatic Speech Recognition and Synthesis.- Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers.- Multi-modal Speech Synthesis with Applications.- Author Index.

2 citations

01 Sep 2008
TL;DR: A parameterisation of lip movements is described which maintains the dynamic structure inherent in the task of producing speech sounds and is believed to be appropriate to various areas of speech modeling, in particular the synthesis of speech lip movements.
Abstract: In this paper we describe a parameterisation of lip movements which maintains the dynamic structure inherent in the task of producing speech sounds. A stereo capture system is used to reconstruct 3D models of a speaker producing sentences from the TIMIT corpus. This data is mapped into a space which maintains the relationships between samples and their temporal derivatives. By incorporating dynamic information within the parameterisation of lip movements we can model the cyclical structure, as well as the causal nature of speech movements as described by an underlying visual speech manifold. It is believed that such a structure will be appropriate to various areas of speech modeling, in particular the synthesis of speech lip movements.

2 citations

Book ChapterDOI
Guy Mercier1, A. Cozannet1, J. Vaissière1
01 Jan 1988
TL;DR: A description of the speaker-dependent continuous speech understanding system KEAL-NEVEZH, an extension of the KEAL system, connected to ALOEMDA, an active chart parser modifying its strategy and linguistic capabilities.
Abstract: A description of the speaker-dependent continuous speech understanding system KEAL-NEVEZH is given An unknown utterance is recognized by means of the following procedures: Acoustic analysis, phonetic segmentation and identification, word and sentence analysis This new system is an extension of the KEAL system, connected to ALOEMDA, an active chart parser modifying its strategy and linguistic capabilities

2 citations

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This work presents a method to synchronize the image and the speech, and it uses Microsoft's Speech Application Programming Interface (SAPI) to be the speech synthesis tool.
Abstract: Synchronization between speech and mouth shape includes technologies, such as computer vision, speech synthesis, and speech recognition. We present a method to synchronize the image and the speech, and we use Microsoft's Speech Application Programming Interface (SAPI) to be the speech synthesis tool. Speech animation includes two components, the speech and the image. Speech synthesis output is obtained from Text-to-Speech (TTS), and the images of visemes are generated from software, FaceGen Modeller. Import three key pictures to this software to calibrate and generate the face model. The viseme event handler in C# will connect the image of mouth shape and viseme together. Load the images sequentially and the visemes will one by one match with the images correctly. The main applications of speech synthesis are used as assistive devices, e.g. the use of screen readers for people with visual impairment. A mute person can take advantage of this technology to talk to others. In recent years, speech synthesis is extensively applied in service robotics and entertainment productions such as language learning, education, video games, animations, and music videos.

2 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822