scispace - formally typeset
Proceedings ArticleDOI

Loss Function Approaches for Multi-label Music Tagging

Reads0
Chats0
TLDR
In this paper, an ensemble-based convolutional neural network (CNN) model trained using various loss functions for tagging musical genres from audio is presented, and the effect of different loss functions and resampling strategies on prediction performance is investigated.
Abstract
Given the ever-increasing volume of music created and released every day, it has never been more important to study automatic music tagging. In this paper, we present an ensemble-based convolutional neural network (CNN) model trained using various loss functions for tagging musical genres from audio. We investigate the effect of different loss functions and resampling strategies on prediction performance, finding that using focal loss improves overall performance on the the MTG-Jamendo dataset: an imbalanced, multi-label dataset with over 18,000 songs in the public domain, containing 57 labels. Additionally, we report results from varying the receptive field on our base classifier—a CNN-based architecture trained using Mel spectrograms—which also results in a model performance boost and state-of-the-art performance on the Jamendo dataset. We conclude that the choice of the loss function is paramount for improving on existing methods in music tagging, particularly in the presence of class imbalance.

read more

Citations
More filters
Proceedings ArticleDOI

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

TL;DR: This work shows that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies, and restricts the domain of the pre-training dataset to music to allow for training with smaller batch sizes.
Proceedings ArticleDOI

On the Role of Visual Context in Enriching Music Representations

TL;DR: In this article , a contrastive learning framework was proposed to learn music representations from audio and the accompanying music videos, which can contribute additive robustness to audio representations and indicate to what extent musical elements are affected or determined by visual context.
Journal ArticleDOI

Creating musical features using multi-faceted, multi-task encoders based on transformers

TL;DR: In this article , a self-attention bidirectional transformers are used to generate audio-musical features that support music understanding, leveraging self-supervision and cross-domain learning.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings ArticleDOI

Focal Loss for Dense Object Detection

TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Proceedings Article

mixup: Beyond Empirical Risk Minimization

TL;DR: This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.
Journal ArticleDOI

A systematic study of the class imbalance problem in convolutional neural networks

TL;DR: The effect of class imbalance on classification performance is detrimental; the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; and thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.
Proceedings ArticleDOI

The million song dataset

TL;DR: The Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks, is introduced and positive results on year prediction are shown, and the future development of the dataset is discussed.
Related Papers (5)
Trending Questions (1)
Can Focal Loss used to solve unbalanced label problem in NLP?

The paper suggests that using Focal Loss improves performance on an imbalanced, multi-label music tagging dataset, but it does not specifically mention NLP.