scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Patent
16 Jul 2008
TL;DR: In this paper, the authors proposed a speech recognizer that has less throughput and high recognition performance especially for speech recognition of a tone language, where tone information indicating tone of the selected label was extracted from the input speech, and corrected on the basis of the extracted tone information and content of the pattern list.
Abstract: PROBLEM TO BE SOLVED: To provide a speech recognizer that has less throughput and high recognition performance especially for speech recognition of a tone language.SOLUTION: The speech recognizer: extracts a fundamental frequency from input speech, and acoustically analyses the input speech; selects one of plural speech recognition results obtained by speech recognition, and outputs a label string indicating the selected speech recognition result; selects at least one label in the output label string on the basis of a pattern list held in advance; and extracts tone information indicating tone of the selected label on the basis of the fundamental frequency extracted from the input speech, and corrects the selected label on the basis of the extracted tone information and content of the pattern list.

3 citations

Journal ArticleDOI
TL;DR: In this paper, a large number of oscillograms were taken, a number of which are reproduced herewith, and the most important observation is that the human voice can start several of the vowel sounds in such a way that the first wave is from 40 to 80 percent of the final amplitude.
Abstract: In view of its bearing on the design of ground noise reduction systems, a study was undertaken, to determine how sudden or rapid are the increases in amplitude of the speech sounds that must be recorded in dialogue. A large number of oscillograms were taken, a number of which are reproduced herewith. The most important observation is that the human voice can start several of the vowel sounds in such a way that the first wave is from 40 to 80 percent of the final amplitude, or in other words with a suddenness comparable to that of keying an oscillator, but this is rare, being for all practical purposes confined to a few of the more open vowel sounds, when not preceded by any consonant, and only true of certain individuals, and depending on the manner of releasing the breath. Progressive build‐up at rates which would carry the modulation from zero to 100 percent in 0.05 second are frequent, while the great majority of syllables start more gradually than this.

2 citations

Journal Article
TL;DR: This thesis report is submitted in partial fulfilment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2016.
Abstract: This thesis report is submitted in partial fulfilment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2016.

2 citations

Proceedings ArticleDOI
22 Sep 2008
TL;DR: A comparative study between spontaneous speech and read Mandarin speech in the context of automatic speech recognition and the technique of Multispace distribution (MSD) to model partially continuous F0 contours is presented.
Abstract: In this paper, we present a comparative study between spontaneous speech and read Mandarin speech in the context of automatic speech recognition. We focus on analysis and modeling of prosodic features, based on a unique speech corpus that contains similar amounts of read and spontaneous speech data from the same group of speakers. Statistical analysis is carried out on tone contours and duration of syllable and subsyllable units. Speech recognition experiments are performed to evaluate the effectiveness of different approaches to incorporate prosodic features into acoustic modeling. A key problem being addressed is how to deal with the unvoiced frames where F0 values are unavailable. We apply the technique of Multispace distribution (MSD) to model partially continuous F0 contours. For spontaneous speech, the tonal-syllable error rate is reduced from the MFCC baseline of 64.8% to 59.4% with the MSD based prosody model. For read speech, the performance improves from 46.0% to 36.4%.

2 citations

Patent
13 Mar 2013
TL;DR: In this paper, the authors described a speech processing system for Oriya English, where a plurality of speech samples are used to form a speech corpora where the plurality of samples comprise sounds of both vowels and consonants.
Abstract: Method(s) and system(s) for speech processing of second language speech are described. According to the present subject matter, the system(s) implement the described method(s) for speech processing of Oriya English. The method for speech processing include receiving a plurality of speech samples of Oriya English to form a speech corpora where the plurality of speech samples comprise sounds of both vowels and consonants and, a plurality of speech parameters are associated with each of the plurality of speech samples. Method also includes determining values of the plurality of speech parameters for each of the plurality of speech samples and identifying difference between the values of each of the plurality of speech parameters and a corresponding value of accent neutral English. Further, the method includes articulating governing language rules based on the identifying to assess phonetic variation and mother tongue influence in sounds of vowels and consonants of Oriya English.

2 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822