Loss Function Approaches for Multi-label Music Tagging

doi:10.1109/CBMI50038.2021.9461913

Proceedings ArticleDOI

Loss Function Approaches for Multi-label Music Tagging

Dillon Knox, +5 more

- pp 1-4

Chats0

TLDR

In this paper, an ensemble-based convolutional neural network (CNN) model trained using various loss functions for tagging musical genres from audio is presented, and the effect of different loss functions and resampling strategies on prediction performance is investigated.

Abstract:

Given the ever-increasing volume of music created and released every day, it has never been more important to study automatic music tagging. In this paper, we present an ensemble-based convolutional neural network (CNN) model trained using various loss functions for tagging musical genres from audio. We investigate the effect of different loss functions and resampling strategies on prediction performance, finding that using focal loss improves overall performance on the the MTG-Jamendo dataset: an imbalanced, multi-label dataset with over 18,000 songs in the public domain, containing 57 labels. Additionally, we report results from varying the receptive field on our base classifier—a CNN-based architecture trained using Mel spectrograms—which also results in a model performance boost and state-of-the-art performance on the Jamendo dataset. We conclude that the choice of the loss function is paramount for improving on existing methods in music tagging, particularly in the presence of class imbalance.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Matthew McCallum, +4 more

TL;DR: This work shows that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies, and restricts the domain of the pre-training dataset to music to allow for training with smaller batch sizes.

...read moreread less

Proceedings ArticleDOI

On the Role of Visual Context in Enriching Music Representations

Kleanthis Avramidis, +2 more

TL;DR: In this article , a contrastive learning framework was proposed to learn music representations from audio and the accompanying music videos, which can contribute additive robustness to audio representations and indicate to what extent musical elements are affected or determined by visual context.

...read moreread less

Journal ArticleDOI

Creating musical features using multi-faceted, multi-task encoders based on transformers

Timothy Greer, +3 more

- 03 Jul 2023 -

Dental science reports

TL;DR: In this article , a self-attention bidirectional transformers are used to generate audio-musical features that support music understanding, leveraging self-supervision and cross-domain learning.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings ArticleDOI

Focal Loss for Dense Object Detection

Tsung-Yi Lin, +4 more

TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

...read moreread less

Proceedings Article

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, +3 more

TL;DR: This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.

...read moreread less

Journal ArticleDOI

A systematic study of the class imbalance problem in convolutional neural networks

Mateusz Buda, +2 more

- 01 Oct 2018 -

Neural Networks

TL;DR: The effect of class imbalance on classification performance is detrimental; the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; and thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.

...read moreread less

Proceedings ArticleDOI

The million song dataset

Thierry Bertin-Mahieux, +3 more

TL;DR: The Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks, is introduced and positive results on year prediction are shown, and the future development of the dataset is discussed.

...read moreread less

IEEE Access

Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks

Khaled Koutini, +2 more

- 25 May 2021 -

IEEE Transactions on Audio, Speech, and ...

Bidirectional Convolutional Recurrent Sparse Network (BCRSN): An Efficient Model for Music Emotion Recognition

Yizhuo Dong, +3 more

- 23 May 2019 -

IEEE Transactions on Multimedia

Loss Function Approaches for Multi-label Music Tagging

Citations

Supervised and Unsupervised Learning of Audio Representations for Music Understanding

On the Role of Visual Context in Enriching Music Representations

Creating musical features using multi-faceted, multi-task encoders based on transformers

References

Adam: A Method for Stochastic Optimization

Focal Loss for Dense Object Detection

mixup: Beyond Empirical Risk Minimization

A systematic study of the class imbalance problem in convolutional neural networks

The million song dataset

Related Papers (5)

The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Tagging

A Deep Neural Network for Modeling Music

A Globally Regularized Joint Neural Architecture for Music Classification

Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks

Bidirectional Convolutional Recurrent Sparse Network (BCRSN): An Efficient Model for Music Emotion Recognition

Trending Questions (1)