Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Speech recognition using synchronization between speech and finger tapping.

[...]

Hiromitsu Ban, Chiyomi Miyajima, Katsunobu Itou, Fumitada Itakura, Kazuya Takeda - Show less +1 more

04 Oct 2004

TL;DR: Leveraging the synchrony between speech and finger tapping provides a 46 % relative improvement and a 1 % absolute improvement in connected digit recognition experiments and LVCSR experiments, respectively.

...read moreread less

Abstract: Behavioral synchronization between speech and finger tapping provides a novel approach to the improvement of speech recognition accuracy. We combine a sequence of finger tapping timings recorded alongside an utterance using two distinct methods: in the first method, HMM state transition probabilities at the word boundaries are controlled by the timing of the finger tapping; in the second, the probability (relative frequency) of the finger tapping is used as a ’feature’ and combined with MFCC in a HMM recognition system. We evaluate these methods through connected digit recognition under different noise conditions (AURORA-2J) and LVCSR tasks. Leveraging the synchrony between speech and finger tapping provides a 46 % relative improvement and a 1 % absolute improvement in connected digit recognition experiments and LVCSR experiments, respectively.

...read moreread less

4 citations

Proceedings Article•DOI•

Allophones in automatic whispery speech recognition

[...]

Piotr Kozierski¹, Talar Sadalla¹, Szymon Drgas¹, Adam Dabrowski¹•Institutions (1)

Poznań University of Technology¹

01 Aug 2016

TL;DR: Experimental results show that the small changes in the allophone set may provide better speech recognition quality than using phonemes approach.

...read moreread less

Abstract: The article presents studies on the automatic whispery speech recognition. In the performed research a new corpus with whispery speech has been used. It has been checked whether the extended set of articulatory units (allophones have been used instead of phonemes) improves quality of whispery speech recognition. Experimental results show that the small changes in the allophone set may provide better speech recognition quality than using phonemes approach. The authors also made available the trained g2p (grapheme-to-phoneme) model of Polish language for the Sequitur toolkit.

...read moreread less

4 citations

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

[...]

Amit Juneja¹•Institutions (1)

University of Maryland, College Park¹

01 Jan 2003

TL;DR: A probabilistic and statistical framework for ASR based on the knowledge of acoustic phonetics for connected word ASR, which could overcome the disadvantages encountered by the early acoustic-phonetic knowledge based systems, that led the ASR community to switch to ASR systems highly dependent on statistical pattern analysis methods.

...read moreread less

Abstract: In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), is the inferior acoustic modeling of low level or phonetic level linguistic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal. But an acoustic phonetic system that carries out large ASR speech recognition tasks, for example, connected word or continuous speech recognition, does not exist. We propose a probabilistic and statistical framework for ASR based on the knowledge of acoustic phonetics for connected word ASR. The proposed system is based on the idea of representation of speech sounds by bundles of binary valued articulatory phonetic features. The probabilistic framework requires only binary classifiers of phonetic features and the knowledge based acoustic correlates of the features for the purpose of connected word speech recognition. We explore the use of Support Vector Machines (SVMs) for binary phonetic feature classification because of the favorable properties well suited to our recognition task that SVMs offer. In the proposed method, probabilistic segmentation of speech is obtained using SVM based classifiers of manner phonetic features. The linguistically motivated landmarks obtained in each segmentation is used for classification of source and place phonetic features. Probabilistic segmentation paths are constrained using Finite State Automata (FSA) for isolated or connected word recognition. The proposed method could overcome the disadvantages encountered by the early acoustic-phonetic knowledge based systems, that led the ASR community to switch to ASR systems highly dependent on statistical pattern analysis methods.

...read moreread less

4 citations

Posted Content•

A three-dimensional approach to Visual Speech Recognition using Discrete Cosine Transforms.

[...]

Toni Heidenreich, Michael W. Spratling

07 Sep 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper is the first to show that 3D feature extraction methods can be applied to continuous sequence recognition tasks despite the unknown start positions and durations of each phoneme, and confirms that3D feature extracted methods improve the accuracy compared to 2D features extraction methods.

...read moreread less

Abstract: Visual speech recognition aims to identify the sequence of phonemes from continuous speech. Unlike the traditional approach of using 2D image feature extraction methods to derive features of each video frame separately, this paper proposes a new approach using a 3D (spatio-temporal) Discrete Cosine Transform to extract features of each feasible sub-sequence of an input video which are subsequently classified individually using Support Vector Machines and combined to find the most likely phoneme sequence using a tailor-made Hidden Markov Model. The algorithm is trained and tested on the VidTimit database to recognise sequences of phonemes as well as visemes (visual speech units). Furthermore, the system is extended with the training on phoneme or viseme pairs (biphones) to counteract the human speech ambiguity of co-articulation. The test set accuracy for the recognition of phoneme sequences is 20%, and the accuracy of viseme sequences is 39%. Both results improve the best values reported in other papers by approximately 2%. The contribution of the result is three-fold: Firstly, this paper is the first to show that 3D feature extraction methods can be applied to continuous sequence recognition tasks despite the unknown start positions and durations of each phoneme. Secondly, the result confirms that 3D feature extraction methods improve the accuracy compared to 2D features extraction methods. Thirdly, the paper is the first to specifically compare an otherwise identical method with and without using biphones, verifying that the usage of biphones has a positive impact on the result.

...read moreread less

4 citations

Proceedings Article•DOI•

Human lips synchronisation in Autodesk Maya

[...]

A. Moura, I. Mazonaviciute, J. Nunes, J. Grigaravicius

27 Jun 2007

TL;DR: In this paper it solves the issue of how to make human lips synchronization in one of the most popular 3D modelling software - Autodesk Maya and an automatic way to analyse voice, using Adobe after effects and how the result interacts to Maya is described.

...read moreread less

Abstract: Realistic facial synthesis is one of the most fundamental problems in computer graphics and one of the most difficult. Human lips synchronization of the faces is very important part of this. In this paper it solves the issue of how to make human lips synchronization in one of the most popular 3D modelling software - Autodesk Maya. It is described an automatic way to analyse voice, using Adobe after effects and how the result interacts to Maya, but this way is still in progress, so voice is analyzed and classified manually. Using programming language of Maya - MEL (Maya Embedded Language) the recognized phonemes are associated with mouth positions to provide visemes for computer animation of speech. Different mouth positions are created using blend shape deformers, that let you to deform a surface into the shapes of other surfaces. Lip animation is facilitated by activating facial muscles and the jaw on the given facial model created specially for animation. Highspeed natural-looking synchronized lips animation is achieved. It is planed to model the expressive visual features of expressive speech. Informal user testing suggests that the addition of detailed internal mouth structures, such as the tongue, would improve the phrase recognition.

...read moreread less

4 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics