Robust speaker recognition

Open Access

Robust speaker recognition

Chats0

TLDR

A speaker segmentation and clustering system aiming at improving the robustness of speaker recognition as well as automatic speech recognition performance in the multiple-speaker scenarios such as telephony conversations and meetings is implemented.

Abstract:

The automatic speaker recognition technologies have developed into more and more important modern technologies required by many speech-aided applications. The main challenge for automatic speaker recognition is to deal with the variability of the environments and channels from where the speech was obtained. In previous work, good results have been achieved for clean high-quality speech with matched training and test acoustic conditions, such as high accuracy of speaker identification and verification using clean wideband speech and Gaussian Mixture Models (GMM). However, under mismatched conditions and noisy environments, often expected in real-world conditions, the performance of GMM-based systems degrades significantly, far away from the satisfactory level. Therefore, robustness becomes a crucial research issue in speaker recognition field. In this thesis, our main focus is to-improve the robustness of speaker recognition systems on far-field distant microphones. We investigate approaches to improve robustness from two directions. First, we investigate approaches to improve robustness for traditional speaker recognition system which is based on low-level spectral information. We introduce a new reverberation compensation approach which, along with feature warping in the feature processing procedure, improves the system performance significantly. We propose four multiple channel combination approaches, which utilize information from multiple far-field microphones, to improve robustness under mismatched training-testing conditions. Secondly, we investigate approaches to use high-level speaker information to improve robustness. We propose new techniques to model speaker pronunciation idiosyncrasy from two dimensions: the cross-stream dimension and the time dimension. Such high-level information is expected to be robust under different mismatched conditions. We also built systems that support robust speaker recognition. We implemented a speaker segmentation and clustering system aiming at improving the robustness of speaker recognition as well as automatic speech recognition performance in the multiple-speaker scenarios such as telephony conversations and meetings. We also integrate speaker identification modality with face recognition modality to build a robust person identification system.

Robust speaker recognition

Citations

Robust speaker identification in noisy and reverberant conditions

Learning Speaker-Specific Characteristics With a Deep Neural Architecture

Learning Speaker-Specific Characteristics With Deep Neural Architecture

Far-Field Speaker Recognition

Far-Field Speaker Recognition

References

A new look at the statistical model identification

Elements of information theory

Estimating the Dimension of a Model

Estimating the dimension of a model

Pattern Classification

Related Papers (5)

Speaker Verification Using Adapted Gaussian Mixture Models

Speaker recognition: a tutorial

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Robust text-independent speaker identification using Gaussian mixture speaker models

Speaker identification and verification using Gaussian mixture speaker models