scispace - formally typeset
J

Jort F. Gemmeke

Researcher at Katholieke Universiteit Leuven

Publications -  95
Citations -  6306

Jort F. Gemmeke is an academic researcher from Katholieke Universiteit Leuven. The author has contributed to research in topics: Speech processing & Acoustic model. The author has an hindex of 23, co-authored 95 publications receiving 4398 citations. Previous affiliations of Jort F. Gemmeke include Google & Radboud University Nijmegen.

Papers
More filters
Proceedings ArticleDOI

Audio Set: An ontology and human-labeled dataset for audio events

TL;DR: The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Proceedings ArticleDOI

CNN architectures for large-scale audio classification

TL;DR: In this paper, the authors used various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels.
Posted Content

CNN Architectures for Large-Scale Audio Classification

TL;DR: This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Journal ArticleDOI

Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

TL;DR: The results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB.
Journal ArticleDOI

Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition

TL;DR: This paper introduces a novel non-parametric, exemplar-based method for reconstructing clean speech from noisy observations, based on techniques from the field of Compressive Sensing, which can impute missing features using larger time windows such as entire words.