scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Proceedings ArticleDOI
06 Sep 2009
TL;DR: The probability of observing a context-dependent symbol is decomposed into the product of probabilities of observing the symbol and its contexts to allow wider contexts to be modelled without greatly compromising the model complexity.
Abstract: Recently, a Probabilistic Phone Mapping (PPM) model was proposed to facilitate cross-lingual automatic speech recognition using a foreign phonetic system. Under this framework, discrete hidden Markov models (HMMs) are used to map a foreign phone sequence to a target phone sequence. Context-sensitive mapping is made possible by expanding the discrete observation symbols to include the contexts of the foreign phones in which they appear in the sequence. Unfortunately, modelling the context dependencies jointly results in dramatic increase in model parameters as wider contexts are used. In this paper, the probability of observing a context-dependent symbol is decomposed into the product of probabilities of observing the symbol and its contexts. This allows wider contexts to be modelled without greatly compromising the model complexity. This can be modelled conveniently using a multiple-stream discrete HMM system where the contexts are treated as independent streams. Experimental results are reported on TIMIT English phone recognition task using the Czech, Hungarian and Russion foreign phone recognisers.

14 citations

Journal ArticleDOI
TL;DR: A significant increase of recognition effectiveness and processing speed were noted during tests – for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate fusion of audio-visual characteristics.
Abstract: This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio speech signal Recognition of audio-visual speech was based on combined hidden Markov models (CHMM) The described methods were developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audiovisual speech recognition The problem of a visual speech analysis is very difficult and computationally demanding, mostly because of an extreme amount of data that needs to be processed Therefore, the method of audio-video speech recognition is used only while the audiospeech signal is exposed to a considerable level of distortion There are proposed the authors’ own methods of the lip edges detection and a visual characteristic extraction in this paper Moreover, the method of fusing speech characteristics for an audio-video signal was proposed and tested A significant increase of recognition effectiveness and processing speed were noted during tests – for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate fusion of audio-visual characteristics The experimental results were very promising and close to those achieved by leading scientists in the field of audio-visual speech recognition

13 citations

Proceedings Article
01 Sep 1997
TL;DR: This paper presents a method for the extraction of articulatory parameters from direct processing of raw images of the lips using an HMMbased visual speech recogniser and recognition scores obtained are compared to reference scores.
Abstract: This paper presents a method for the extraction of articulatory parameters from direct processing of raw images of the lips. The system architecture is made of three independent parts. First, a new greyscale mouth image is centred and downsampled. Second, the image is aligned and projected onto a basis of artificial images. These images are the eigenvectors computed from a PCA applied on a set of 23 reference lip shapes. Then, a multilinear interpolation predicts articulatory parameters from the image projection coefficients onto the eigenvectors. In addition, the projection coefficients and the predicted parameters were evaluated by an HMMbased visual speech recogniser. Recognition scores obtained with our method are compared to reference scores and discussed.

13 citations

Proceedings ArticleDOI
09 Dec 2002
TL;DR: An overview of the large scale national project entitled "Spontaneous speech: corpus and processing technology" in Japan is given and the major results of experiments that have been conducted so far are reported, including spontaneous presentation speech recognition, automatic speech summarization, and message-driven speech recognition.
Abstract: How to recognize and understand spontaneous speech is one of the most important issues in state-of-the-art speech recognition technology. In this context, a five-year large scale national project entitled "Spontaneous speech: corpus and processing technology" started in Japan in 1999. This paper gives an overview of the project and reports on the major results of experiments that have been conducted so far at Tokyo Institute of Technology, including spontaneous presentation speech recognition, automatic speech summarization, and message-driven speech recognition. The paper also discusses the most important research problems to be solved in order to achieve ultimate spontaneous speech recognition systems.

13 citations

Patent
05 Feb 2015
TL;DR: In this article, a method and apparatus for speech recognition and for generation of speech recognition engine, and a Speech Recognition Engine (SRE) for speech generation, is presented, in which the speech recognition system obtains a phoneme sequence from the speech input and provides the recognition result based on the phonetic distance of the phoneme sequences.
Abstract: A method and apparatus for speech recognition and for generation of speech recognition engine, and a speech recognition engine are provided. The method of speech recognition involves receiving a speech input, transmitting the speech input to a speech recognition engine, and receiving a speech recognition result from the speech recognition engine, in which the speech recognition engine obtains a phoneme sequence from the speech input and provides the speech recognition result based on a phonetic distance of the phoneme sequence.

13 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822