Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Vowel recognition in continuous speech

[...]

Darrell C. Stam

01 Jan 1989

TL;DR: This chapter discusses Vowel Recognition in Continuous Speech using the Gaussian classifier, the Neural Network, and the Hidden Markov Models.

...read moreread less

Abstract: 1 Chapter 1Speech Understanding 2 The Speech Recognition Problem 2 Speaker Related Systems 2 Continuous Speech 3 Vocabulary Size 5 The Vowel Classifier 7 Chapter 2 Phonetics 8 Phoneme Variability 8 Coarticulation 8 Spectrogram Reading 11 Chapter 3 Production, Acoustics and Perception ofVowels 12 Source-Filter Theory 12 The Source 13 The Filter 14 Vowel Production 16 Diphthongs 18 Semi-Vowels 18 Vowel Nasalization 19 Vowel Acoustics 20 Effects of Coarticulation 21 Vowel Perception 25 Automatic Vowel Recognition 26 Summary SL Chapter 4 System Implementation 32 General Description 32 Database 33 Feature Sets 34 Linear Predictive Coding 34 Spectral Moments 34 Median Value 36 Formants and Fundamental Frequency 36 Vowel Extraction 37 Preclassification 37 Maximum Likelihood 38 Neural Network 38 Dynamic Classification Using Hidden Markov Models 40 Chapter 5 Results and Conclusions 42 Database Size 42 Vowel Separability 43 Feature Set 50 Preclassification Results 50 Understanding Classification Errors 54 Dynamic Classification 55 Average Center Values 58 Three-Frame Sampling 59 Projecting Results 60 Conclusions 61 Further Studies 63 Chapter 6 User Documentation 66 Building the database 66 Designing the Neural Network 68 Designing the Gaussian classifier 69 Designing the Hidden Markov Model 70 Extra Useful Routines 72 References 73 Appendix A 76 Appendix B 77 Appendix C 79 Appendix D 85 Appendix E Glossary 89 Vowel Recognition in Continuous Speech 11/1 4/89

...read moreread less

1 citations

A new visual speech modelling approach for visual speech recognition

[...]

Dahai Yu, Ovidiu Ghita, Alistair Sutherland, Paul F. Whelan

01 Jan 2012

TL;DR: The experimental results indicate that the new Visual Speech Unit concept achieves 90% recognition rate when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 52%.

...read moreread less

Abstract: In this paper we propose a new learning-based representation that is referred to as Visual Speech Unit (VSU) for visual speech recognition (VSR). The new Visual Speech Unit concept proposes an extension of the standard viseme model that is currently applied for VSR by including in this representation not only the data associated with the visemes, but also the transitory information between consecutive visemes. The developed speech recognition system consists of several computational stages: (a) lips segmentation, (b) construction of the Expectation-Maximization Principal Component Analysis (EM-PCA) manifolds from the input video image, (c) registration between the models of the VSUs and the EM-PCA data constructed from the input image sequence and (d) recognition of the VSUs using a standard Hidden Markov Model (HMM) classification scheme. In this paper we were particularly interested to evaluate the classification accuracy obtained for our new VSU models when compared with that attained for standard (MPEG-4) viseme models. The experimental results indicate that we achieved 90% recognition rate when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 52%.

...read moreread less

1 citations

Proceedings Article•

On using fractal features of speech sounds in automatic speech recognition.

[...]

Petros Maragos, Alexandros Potamianos

01 Jan 1997

TL;DR: This paper quantifies the geometry of speech turbulence as re ected in the fragmentation of the time signal by using fractal models and describes an algorithm for estimating the short-time fractal dimension of speech signals based on multiscale morphological ltering and discusses its potential for phonetic classi cation.

...read moreread less

Abstract: The dynamics of air ow during speech production may often result into some small or large degree of turbulence. In this paper, we quantify the geometry of speech turbulence as re ected in the fragmentation of the time signal by using fractal models. We describe an e cient algorithm for estimating the short-time fractal dimension of speech signals based on multiscale morphological ltering and discuss its potential for phonetic classi cation. We also report experimental results on using the shorttime fractal dimension of speech signals at multiple scales as additional features in an automatic speech recognition system using hidden Markov models, which provides a modest improvement in speech recognition performance.

...read moreread less

1 citations

Proceedings Article•

Non-uniform unit HMMS for speech recognition.

[...]

Takeshi Matsumura, Shoichi Matsunaga

01 Jan 1995

1 citations

Journal Article•DOI•

Speechreading the Nonvisible Consonant Viseme Group with and without Single-Channel Stimulation

[...]

L. J. Dent

01 Jan 1987-Annals of Otology, Rhinology, and Laryngology

TL;DR: The 15 hours of training with each of these skilled lipreaders in the LA condition suggest that the consonants /s,l,TH/ are quite reliably identified through lipreading alone, and a processor optimized to enhance specifically these contrasts might prove to be the better speechreading aid.

...read moreread less

Abstract: The homorganic obstruent pairs /p-b, t-d, k-g, ch-j, f-v, th-TH, s-z, sh-zh/ are notoriously confusable in the lipreading alone (LA) condition, and the nasal consonants Iml and Inl are often mistaken for their homorganic oral counteφarts. Implant users' speechreading of these viseme group members is improved by the addition of either multi ple channel or single-channel electrical stimulation.'^ The palatal obstruent distinctions /sh,zh,ch,]7 have been targeted for remediation via other speechreading aids.' One approach to implant sound processor setting involves opti mizing to distinguish the nonvisible frequently occurring consonants /t,d,k,s,z,n,l,TH/ (E. Schubert, personal com munication). This optimization method resulted in signifi cant speechreading improvement and even some open speech comprehension without lipreading for one deaf patient (M. White, personal communication). During a 50-week consonant training program conducted in our laboratory, two experimental subjects spent seven sessions identifying consonants in the LA condition, and five sub sequent sessions identifying the same consonants in the stimulation plus lipreading condition, aided by a singlechannel sound processor. Our 15 hours of training with each of these skilled lipreaders in the LA condition suggest that the consonants /s,l,TH/ are quite reliably identified through lipreading alone. Of the consonants not visible on the lips, /t,d,k,g,n,y/ are the troublesome contrasts. The singlechannel sound processor provides some help in disambigu ating these six consonants for deaf speechreaders, although a processor optimized to enhance specifically these contrasts might prove to be the better speechreading aid.

...read moreread less

1 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics