scispace - formally typeset
Open AccessPosted Content

The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021

TLDR
In the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019, this article proposed a fusion of 9 models and achieved first place in these two tracks of VoxSRC 2021.
Abstract
This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same speaker verification system, which only uses VoxCeleb2-dev as our training set. This report explores several parts, including data augmentation, network structures, domain-based large margin fine-tuning, and back-end refinement. Our system is a fusion of 9 models and achieves first place in these two tracks of VoxSRC 2021. The minDCF of our submission is 0.1034, and the corresponding EER is 1.8460%.

read more

Citations
More filters
Posted Content

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

TL;DR: WavLM as mentioned in this paper proposes a pre-trained model to solve full-stack downstream speech tasks and achieves state-of-the-art performance on the SUPERB speech recognition task.
Journal ArticleDOI

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

TL;DR: WavLM as discussed by the authors jointly learns masked speech prediction and denoising in pre-training to solve full-stack downstream speech tasks and achieves state-of-the-art performance on the SUPERB benchmark.
Posted Content

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification.

TL;DR: In this paper, the authors explore the limits of speech representations learned by different self-supervised objectives and datasets for automatic speaker verification (ASV), especially with a well-recognized SOTA ASV model, ECAPA-TDNN, as a downstream model.
Posted Content

Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification.

TL;DR: This article proposed a multi-query multi-head attention (MQMHA) pooling and inter-top-K penalty method, which achieved state-of-the-art performance in all the public VoxCeleb test sets.
Posted Content

Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information.

TL;DR: The authors showed that the typical training and scoring protocols do not put enough emphasis on the compensation of intra-speaker language variability and proposed two techniques to increase cross-lingual speaker verification robustness.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

The Kaldi Speech Recognition Toolkit

TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Proceedings ArticleDOI

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.
Proceedings ArticleDOI

X-Vectors: Robust DNN Embeddings for Speaker Recognition

TL;DR: This paper uses data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness of deep neural network embeddings for speaker recognition.