scispace - formally typeset
Search or ask a question
Author

Jakob Abeßer

Bio: Jakob Abeßer is an academic researcher from Fraunhofer Society. The author has contributed to research in topics: Music information retrieval & Bass (sound). The author has an hindex of 11, co-authored 48 publications receiving 416 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This article summarizes and groups existing approaches for data preparation, i.e., feature representations, feature pre-processing, and data augmentation, and for data modeling, i.
Abstract: The number of publications on acoustic scene classification (ASC) in environmental audio recordings has constantly increased over the last few years. This was mainly stimulated by the annual Detection and Classification of Acoustic Scenes and Events (DCASE) competition with its first edition in 2013. All competitions so far involved one or multiple ASC tasks. With a focus on deep learning based ASC algorithms, this article summarizes and groups existing approaches for data preparation, i.e., feature representations, feature pre-processing, and data augmentation, and for data modeling, i.e., neural network architectures and learning paradigms. Finally, the paper discusses current algorithmic limitations and open challenges in order to preview possible future developments towards the real-life application of ASC systems.

96 citations

01 Jan 2014
TL;DR: A novel algorithm for automatic analysis, transcription, and parameter extraction from isolated polyphonic guitar recordings, which achieved very high accuracy values for onset and offset detection as well as multipitch estimation.
Abstract: In this paper we present a novel algorithm for automatic analysis, transcription, and parameter extraction from isolated polyphonic guitar recordings. In addition to general score-related information such as note onset, duration, and pitch, instrumentspecific information such as the plucked string, the applied plucking and expression styles are retrieved automatically. For this purpose, we adapted several state-of-the-art approaches for onset and offset detection, multipitch estimation, string estimation, feature extraction, and multi-class classification. Furthermore we investigated a robust partial tracking algorithm with respect to inharmonicity, an extensive extraction of novel and known audio features as well as the exploitation of instrument-based knowledge in the form of plausability filtering to obtain more reliable prediction. Our system achieved very high accuracy values of 98 % for onset and offset detection as well as multipitch estimation. For the instrument-related parameters, the proposed algorithm also showed very good performance with accuracy values of 82 % for the string number, 93 % for the plucking style, and 83 % for the expression style. Index Terms - playing techniques, plucking style, expression style, multiple fundamental frequency estimation, string classification, fretboard position, fingering, electric guitar, inharmonicity coefficient, tablature

55 citations

DOI
01 Jan 2012
TL;DR: Three well-known MIR methods used in music learning systems and their state-of-the-art are described: music transcription, solo and accompaniment track creation, and generation of performance instructions.
Abstract: This paper addresses the use of Music Information Retrieval (MIR) techniques in music education and their integration in learning software. A general overview of systems that are either commercially available or in research stage is presented. Furthermore, three well-known MIR methods used in music learning systems and their state-of-the-art are described: music transcription, solo and accompaniment track creation, and generation of performance instructions. As a representative example of a music learning system developed within the MIR community, the Songs2See software is outlined. Finally, challenges and directions for future research are described.

38 citations

Proceedings Article
01 Jan 2009
TL;DR: A novel two-dimensional approach for automatic music genre classification is described, which proposes to break down multi-label genre annotations into single-label annotations within given time segments and musical domains.
Abstract: In this publication we describe a novel two-dimensional approach for automatic music genre classification. Although the subject poses a well studied task in Music Information Retrieval, some fundamental issues of genre classification have not been covered so far. Especially many modern genres are influenced by manifold musical styles. Most of all, this holds true for the broad category “World Music”, which comprises many different regional styles and a mutual mix up thereof. A common approach to tackle this issue in manual categorization is to assign multiple genre labels to a single recording. However, for commonly used automatic classification algorithms, multilabeling poses a problem due to its ambiguities. Thus, we propose to break down multi-label genre annotations into single-label annotations within given time segments and musical domains. A corresponding multi-stage evaluation based on a representative set of items from a global music taxonomy is performed and discussed accordingly. Therefore, we conduct 3 different experiments that cover multi-labeling, multi-labeling with time segmentation and the proposed multi-domain labeling.

31 citations

Proceedings Article
01 Jan 2018
TL;DR: This paper builds upon a recently proposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolutional and fully connected layers for learning characteristic spectral-temporal patterns.
Abstract: Predominant instrument recognition in ensemble recordings remains a challenging task, particularly if closelyrelated instruments such as alto and tenor saxophone need to be distinguished. In this paper, we build upon a recentlyproposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolutional and fully connected layers for learning characteristic spectral-temporal patterns. We systematically evaluate harmonic/percussive and solo/accompaniment source separation algorithms as pre-processing steps to reduce the overlap among multiple instruments prior to the instrument recognition step. For the particular use-case of solo instrument recognition in jazz ensemble recordings, we further apply transfer learning techniques to fine-tune a previously trained instrument recognition model for classifying six jazz solo instruments. Our results indicate that both source separation as pre-processing step as well as transfer learning clearly improve recognition performance, especially for smaller subsets of highly similar instruments.

30 citations


Cited by
More filters
Book ChapterDOI
E.R. Davies1
01 Jan 1990
TL;DR: This chapter introduces the subject of statistical pattern recognition (SPR) by considering how features are defined and emphasizes that the nearest neighbor algorithm achieves error rates comparable with those of an ideal Bayes’ classifier.
Abstract: This chapter introduces the subject of statistical pattern recognition (SPR). It starts by considering how features are defined and emphasizes that the nearest neighbor algorithm achieves error rates comparable with those of an ideal Bayes’ classifier. The concepts of an optimal number of features, representativeness of the training data, and the need to avoid overfitting to the training data are stressed. The chapter shows that methods such as the support vector machine and artificial neural networks are subject to these same training limitations, although each has its advantages. For neural networks, the multilayer perceptron architecture and back-propagation algorithm are described. The chapter distinguishes between supervised and unsupervised learning, demonstrating the advantages of the latter and showing how methods such as clustering and principal components analysis fit into the SPR framework. The chapter also defines the receiver operating characteristic, which allows an optimum balance between false positives and false negatives to be achieved.

1,189 citations

Journal ArticleDOI

717 citations

Journal ArticleDOI
01 Dec 2013
TL;DR: Limits of current transcription methods are analyzed and promising directions for future research are identified, including the integration of information from multiple algorithms and different musical aspects.
Abstract: Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

298 citations

Proceedings Article
01 Jan 2017
TL;DR: A fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset is described and shown to achieve state-of-the-art performance on several multi-f0 and melody datasets.
Abstract: Estimating fundamental frequencies in polyphonic music remains a notoriously difficult task in Music Information Retrieval. While other tasks, such as beat tracking and chord recognition have seen improvement with the application of deep learning models, little work has been done to apply deep learning methods to fundamental frequency related tasks including multi-f0 and melody tracking, primarily due to the scarce availability of labeled data. In this work, we describe a fully convolutional neural network for learning salience representations for estimating fundamental frequencies, trained using a large, semi-automatically generated f0 dataset. We demonstrate the effectiveness of our model for learning salience representations for both multi-f0 and melody tracking in polyphonic audio, and show that our models achieve state-of-the-art performance on several multi-f0 and melody datasets. We conclude with directions for future research.

148 citations

Journal ArticleDOI
Bob L. Sturm1
TL;DR: This article disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of M GR systems in GTZAN are still meaningfully comparable since they all face the same faults.
Abstract: The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.

141 citations