scispace - formally typeset
Book ChapterDOI

Speech Features Analysis for Tone Language Speaker Discrimination Systems

Reads0
Chats0
TLDR
A speech pattern analysis framework for tone language speaker discrimination systems is proposed that holds the hypothesis that speech feature variability is an efficient means for discriminating speakers and confirms high inter-variability—between speakers, and low intra-Variability—within speakers.
Abstract
In this paper, a speech pattern analysis framework for tone language speaker discrimination systems is proposed. We hold the hypothesis that speech feature variability is an efficient means for discriminating speakers. To achieve this, we exploit prosody-related acoustic features (pitch, intensity and glottal pulse) of corpus recordings obtained from male and female speakers of varying age categories: children (0–15), youths (16–30), adults (31–50), seniors (above 50)—and captured under suboptimal conditions. The speaker dataset was segmented into three sets: train, validation and test set—in the ratio of 70%, 15% and 15%, respectively. A 41 × 14 self-organizing map (SOM) architecture was then used to model the speech features, thereby determining the relationship between the speech features, segments and patterns. Results of a speech pattern analysis indicated wide F0 variability amongst children speakers compared with other speakers. This gap however closes as the speaker ages. Further, the intensity variability among speakers was similar across all speaker classes/categories, while glottal pulse exhibited significant variation among the different speaker classes. Results of SOM feature visualization confirmed high inter-variability—between speakers, and low intra-variability—within speakers.

read more

Citations
More filters
Book ChapterDOI

A Complex Cognitive-Based Technique for Social Tension Detection in the Internet

TL;DR: A problem of the automatic social tension detection is considered in terms of combined content, including video and static pictures with texts, presuppose that social tension can be detected by means of particular language markers and individuals emotional states identification.
References
More filters
Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
Journal ArticleDOI

An introduction to biometric recognition

TL;DR: A brief overview of the field of biometrics is given and some of its advantages, disadvantages, strengths, limitations, and related privacy concerns are summarized.
Proceedings Article

Duration modeling for HMM-based speech synthesis.

TL;DR: This paper takes account of contextual factors such as stressrelated factors and locational factors in addition to phone identity factors to synthesize good quality speech with natural timing and the speaking rate can be varied easily.
Journal ArticleDOI

Modeling durations of syllables using neural networks

TL;DR: A four layer feedforward neural network trained with backpropagation algorithm is used for modeling the duration knowledge of syllables in Broadcast news data in three Indian languages Hindi, Telugu and Tamil to find that 85% of the syllable durations could be predicted from the models within 25%" of the actual duration.
Related Papers (5)