scispace - formally typeset
Open Access

Robust speaker recognition

Reads0
Chats0
TLDR
A speaker segmentation and clustering system aiming at improving the robustness of speaker recognition as well as automatic speech recognition performance in the multiple-speaker scenarios such as telephony conversations and meetings is implemented.
Abstract
The automatic speaker recognition technologies have developed into more and more important modern technologies required by many speech-aided applications. The main challenge for automatic speaker recognition is to deal with the variability of the environments and channels from where the speech was obtained. In previous work, good results have been achieved for clean high-quality speech with matched training and test acoustic conditions, such as high accuracy of speaker identification and verification using clean wideband speech and Gaussian Mixture Models (GMM). However, under mismatched conditions and noisy environments, often expected in real-world conditions, the performance of GMM-based systems degrades significantly, far away from the satisfactory level. Therefore, robustness becomes a crucial research issue in speaker recognition field. In this thesis, our main focus is to-improve the robustness of speaker recognition systems on far-field distant microphones. We investigate approaches to improve robustness from two directions. First, we investigate approaches to improve robustness for traditional speaker recognition system which is based on low-level spectral information. We introduce a new reverberation compensation approach which, along with feature warping in the feature processing procedure, improves the system performance significantly. We propose four multiple channel combination approaches, which utilize information from multiple far-field microphones, to improve robustness under mismatched training-testing conditions. Secondly, we investigate approaches to use high-level speaker information to improve robustness. We propose new techniques to model speaker pronunciation idiosyncrasy from two dimensions: the cross-stream dimension and the time dimension. Such high-level information is expected to be robust under different mismatched conditions. We also built systems that support robust speaker recognition. We implemented a speaker segmentation and clustering system aiming at improving the robustness of speaker recognition as well as automatic speech recognition performance in the multiple-speaker scenarios such as telephony conversations and meetings. We also integrate speaker identification modality with face recognition modality to build a robust person identification system.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Robust speaker identification in noisy and reverberant conditions

TL;DR: A robust SID with speaker models trained in selected reverberant conditions is performed, on the basis of bounded marginalization and direct masking, which substantially improves SID performance over related systems in a wide range of reverberation time and signal-to-noise ratios.
Journal ArticleDOI

Learning Speaker-Specific Characteristics With a Deep Neural Architecture

TL;DR: In this article, a deep neural architecture (DNA) was proposed for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and speaker recognition, which results in a speakerspecific overcomplete representation.

Learning Speaker-Specific Characteristics With Deep Neural Architecture

Ahmad Salman
TL;DR: A novel deep neural architecture especially for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and SR, which results in a speaker- specific overcomplete representation.
Proceedings ArticleDOI

Far-Field Speaker Recognition

TL;DR: This paper introduces reverberation compensation as well as feature warping and shows that higher-level features are more robust under mismatching conditions, which suggests that speaker recognition using multilingual phone strings could be successfully applied to any given language.
Journal ArticleDOI

Far-Field Speaker Recognition

TL;DR: This paper introduces reverberation compensation as well as feature warping and shows that higher-level features are more robust under mismatching conditions, which suggests that speaker recognition using multilingual phone strings could be successfully applied to any given language.
References
More filters
Journal ArticleDOI

A new look at the statistical model identification

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal ArticleDOI

Estimating the Dimension of a Model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

Estimating the dimension of a model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Related Papers (5)