scispace - formally typeset
S

Sourish Chaudhuri

Researcher at Google

Publications -  36
Citations -  2621

Sourish Chaudhuri is an academic researcher from Google. The author has contributed to research in topics: Collaborative learning & Speaker diarisation. The author has an hindex of 13, co-authored 36 publications receiving 1711 citations. Previous affiliations of Sourish Chaudhuri include Carnegie Mellon University.

Papers
More filters
Proceedings ArticleDOI

CNN architectures for large-scale audio classification

TL;DR: In this paper, the authors used various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels.
Posted Content

CNN Architectures for Large-Scale Audio Classification

TL;DR: This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Proceedings ArticleDOI

Non-negative matrix factorization based compensation of music for automatic speech recognition.

TL;DR: Non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music is proposed and shown to produce a consistent, significant improvement on the recognition performance in the comparison with the baseline method.
Posted Content

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

TL;DR: This paper presents the AVA Active Speaker detection dataset (AVA-ActiveSpeaker), which has been publicly released to facilitate algorithm development and comparison, and introduces a state-of-the-art, jointly trained audio-visual model for real-time active speaker detection and compares several variants.
Proceedings ArticleDOI

Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection

TL;DR: The AVA Active Speaker dataset (AVA-ActiveSpeaker) as discussed by the authors contains temporally labeled face tracks in videos, where each face instance is labeled as speaking or not, and whether the speech is audible.