Sourish Chaudhuri

Researcher at Google

Publications - 36

Citations - 2621

Sourish Chaudhuri is an academic researcher from Google. The author has contributed to research in topics: Collaborative learning & Speaker diarisation. The author has an hindex of 13, co-authored 36 publications receiving 1711 citations. Previous affiliations of Sourish Chaudhuri include Carnegie Mellon University.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

CNN architectures for large-scale audio classification

Shawn Hershey, +12 more

TL;DR: In this paper, the authors used various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels.

...read moreread less

Posted Content

CNN Architectures for Large-Scale Audio Classification

Shawn Hershey, +12 more

- 29 Sep 2016 -

arXiv: Sound

TL;DR: This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.

...read moreread less

Proceedings ArticleDOI

Non-negative matrix factorization based compensation of music for automatic speech recognition.

Bhiksha Raj, +3 more

TL;DR: Non-negative matrix factorization based speech enhancement in robust automatic recognition of mixtures of speech and music is proposed and shown to produce a consistent, significant improvement on the recognition performance in the comparison with the baseline method.

...read moreread less

Posted Content

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

Joseph Roth, +10 more

- 05 Jan 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper presents the AVA Active Speaker detection dataset (AVA-ActiveSpeaker), which has been publicly released to facilitate algorithm development and comparison, and introduces a state-of-the-art, jointly trained audio-visual model for real-time active speaker detection and compares several variants.

...read moreread less

Proceedings ArticleDOI

Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection

Joseph Roth, +10 more

TL;DR: The AVA Active Speaker dataset (AVA-ActiveSpeaker) as discussed by the authors contains temporally labeled face tracks in videos, where each face instance is labeled as speaking or not, and whether the speech is audible.

...read moreread less

Collapse