scispace - formally typeset
Journal ArticleDOI

Large population speaker identification using clean and telephone speech

TLDR
This paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech, and telephone speech using a system based on Gaussian mixture speaker models.
Abstract
This paper presents text-independent speaker identification results for varying speaker population sizes up to 630 speakers for both clean, wideband speech, and telephone speech. A system based on Gaussian mixture speaker models is used for speaker identification, and experiments are conducted on the TIMIT and NTIMIT databases. The TIMIT results show large population performance under near-ideal conditions, and the NTIMIT results show the corresponding accuracy loss due to telephone transmission. These are believed to be the first speaker identification experiments on the complete 630 speaker TIMIT and NTIMIT databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 99.5 and 60.7% were achieved on the TIMIT and NTIMIT databases, respectively. >

read more

Citations
More filters
Journal ArticleDOI

Significance of the Modified Group Delay Feature in Speech Recognition

TL;DR: The group delay function is modified to overcome the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects and is called the modified group delay feature (MODGDF).
Journal ArticleDOI

Telephony-based voice pathology assessment using automated speech analysis

TL;DR: A system for remotely detecting vocal fold pathologies using telephone-quality speech is presented, and neuromuscular disorders and physical abnormalities could be detected remotely with an accuracy of 87%, physical abnormalities withAn accuracy of 78% and mixed pathology voice with a accuracy of 61% are highlighted.
Journal ArticleDOI

Human and computer recognition of regional accents and ethnic groups from British English speech

TL;DR: It seems that the state-of-the-art LID system performs much better on the standard 12 class NIST 2003 Language Recognition Evaluation task or the two class ethnic group recognition task than on the 14 class regional accent recognition task.
Journal ArticleDOI

Robust text-independent speaker identification over telephone channels

TL;DR: Two approaches are concentrated on extracting features that are robust against channel variations and transforming the speaker models to compensate for channel effects, which resulted in a 38% relative improvement on the closed-set 30-s training 5-s testing condition of the NIST'95 Evaluation task.
Proceedings ArticleDOI

Application of the modified group delay function to speaker identification and discrimination

TL;DR: The modified group delay feature (MODGDF) is used as a front end feature in a Gaussian mixture model (GMM) based speaker identification system and it is shown that the MODGDF has speaker specific properties.
References
More filters
Proceedings ArticleDOI

SWITCHBOARD: telephone speech corpus for research and development

TL;DR: SWITCHBOARD as mentioned in this paper is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition.
Journal ArticleDOI

Speaker identification and verification using Gaussian mixture speaker models

TL;DR: High performance speaker identification and verification systems based on Gaussian mixture speaker models: robust, statistically based representations of speaker identity, evaluated on four publically available speech databases.
Proceedings ArticleDOI

NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database

TL;DR: The creation of the network TIMIT (NTIMIT) database, which is the result of transmitting the TIMIT database over the telephone network, is described, including characteristics useful for speech analysis and recognition.
Proceedings ArticleDOI

Text independent speaker identification using automatic acoustic segmentation

TL;DR: An acoustic-class-dependent technique for text-independent speaker identification on very short utterances is described, based on maximum-likelihood estimation of a Gaussian mixture model representation of speaker identity.
Related Papers (5)