scispace - formally typeset
A

Arsha Nagrani

Researcher at University of Oxford

Publications -  65
Citations -  5433

Arsha Nagrani is an academic researcher from University of Oxford. The author has contributed to research in topics: Computer science & Speaker recognition. The author has an hindex of 21, co-authored 54 publications receiving 3164 citations. Previous affiliations of Arsha Nagrani include Google.

Papers
More filters
Proceedings ArticleDOI

VoxCeleb2: Deep Speaker Recognition.

TL;DR: In this article, a large-scale audio-visual speaker recognition dataset, VoxCeleb2, is presented, which contains over a million utterances from over 6,000 speakers.
Proceedings ArticleDOI

VoxCeleb: A Large-Scale Speaker Identification Dataset.

TL;DR: This paper proposes a fully automated pipeline based on computer vision techniques to create a large scale text-independent speaker identification dataset collected 'in the wild', and shows that a CNN based architecture obtains the best performance for both identification and verification.
Journal ArticleDOI

Voxceleb: Large-scale speaker verification in the wild

TL;DR: A very large-scale audio-visual dataset collected from open source media using a fully automated pipeline and developed and compared different CNN architectures with various aggregation methods and training loss functions that can effectively recognise identities from voice under various conditions are introduced.
Proceedings ArticleDOI

Utterance-level Aggregation for Speaker Recognition in the Wild

TL;DR: This paper proposes a powerful speaker recognition deep network, using a ‘thin-ResNet’ trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end.
Proceedings ArticleDOI

Emotion Recognition in Speech using Cross-Modal Transfer in the Wild

TL;DR: This article showed that the emotional content of speech correlates with the facial expression of the speaker, which can be transferred from the visual domain (faces) to the speech domain (voices) through cross-modal distillation.