Joint time-frequency scattering for audio classification
Joakim Andén,Vincent Lostanlen,Stéphane Mallat +2 more
- pp 1-6
Reads0
Chats0
TLDR
It is shown that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and frequency modulated excitations on the TIMIT dataset.Abstract:
We introduce the joint time-frequency scattering transform, a time shift invariant descriptor of time-frequency structure for audio classification. It is obtained by applying a two-dimensional wavelet transform in time and log-frequency to a time-frequency wavelet scalogram. We show that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and frequency modulated excitations. State-of-the-art results are achieved for signal reconstruction and phone segment classification on the TIMIT dataset.read more
Citations
More filters
Journal ArticleDOI
Understanding deep convolutional networks.
TL;DR: Deep convolutional networks provide state-of-the-art classifications and regressions results over many high-dimensional problems and a mathematical framework is introduced to analyse their properties.
Journal ArticleDOI
Per-Channel Energy Normalization: Why and How
Vincent Lostanlen,Justin Salamon,Mark Cartwright,Brian McFee,Andrew Farnsworth,Steve Kelling,Juan Pablo Bello +6 more
TL;DR: This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints and describes the asymptotic regimes in PCEN: temporal integration, gain control, and dynamic range compression.
Journal ArticleDOI
Robust sound event detection in bioacoustic sensor networks.
Vincent Lostanlen,Vincent Lostanlen,Justin Salamon,Andrew Farnsworth,Steve Kelling,Juan Pablo Bello +5 more
TL;DR: In this paper, the authors proposed a method for detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six autonomous recording units (ARUs) in the presence of heterogeneous background noise.
Proceedings ArticleDOI
Extended playing techniques: the next milestone in musical instrument recognition
TL;DR: This work identifies and discusses three necessary conditions for significantly outperforming the traditional mel-frequency cepstral coefficient (MFCC) baseline: the addition of second-order scattering coefficients to account for amplitude modulation, the incorporation of long-range temporal dependencies, and metric learning using large-margin nearest neighbors (LMNN) to reduce intra-class variability.
Proceedings ArticleDOI
Exponential decay of scattering coefficients
TL;DR: In this article, it was shown that the norm of the scattering coefficients at a given layer only depends on the values of the signal outside a frequency band whose size is exponential in the depth of the layer.
References
More filters
Journal ArticleDOI
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Journal ArticleDOI
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
S. Davis,Paul Mermelstein +1 more
TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.
Proceedings ArticleDOI
Convolutional networks and applications in vision
TL;DR: New unsupervised learning algorithms, and new non-linear stages that allow ConvNets to be trained with very few labeled samples are described, including one for visual object recognition and vision navigation for off-road mobile robots.
Proceedings Article
Unsupervised feature learning for audio classification using convolutional deep belief networks
TL;DR: In this paper, the authors apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks and show that the learned features correspond to phones/phonemes.
Journal ArticleDOI
Group Invariant Scattering
TL;DR: This paper constructs translation-invariant operators on L 2 .R d /, which are Lipschitz-continuous to the action of diffeomorphisms, and extendsScattering operators are extended on L2 .G/, where G is a compact Lie group, and are invariant under theaction of G.