A Multiple-Expert Framework for Instrument Recognition

doi:10.1007/978-3-319-12976-1_38

Home
/
Papers
/
A Multiple-Expert Framework for Instrument Recognition

Book Chapter•DOI•

A Multiple-Expert Framework for Instrument Recognition

Mikus Grasis, Jakob Abeßer, Christian Dittmar, Hanna Lukashevich

15 Oct 2013-pp 619-634

TL;DR: A new approach towards feature-based instrument recognition is presented that makes use of redundancies in the harmonic structure and temporal development of a note that is targeted at transferability towards use on polyphonic material.

read less

Abstract: Instrument recognition is an important task in music information retrieval (MIR). Whereas the recognition of musical instruments in monophonic recordings has been studied widely, the polyphonic case still is far from being solved. A new approach towards feature-based instrument recognition is presented that makes use of redundancies in the harmonic structure and temporal development of a note. The structure of the proposed method is targeted at transferability towards use on polyphonic material. Multiple feature categories are extracted and classified separately with SVM models. In a further step, class probabilities are aggregated in a two-step combination scheme. The presented system was evaluated on a dataset of 3300 isolated single notes. Different aggregation methods are compared. As the results of the joined classification outperform individual categories, further development of the presented technique is motivated.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•

Jazz Solo Instrument Classification with Convolutional Neural Networks, Source Separation, and Transfer Learning.

[...]

Juan S. Gómez, Jakob Abeßer¹, Estefanía Cano¹•Institutions (1)

Fraunhofer Society¹

01 Jan 2018

TL;DR: This paper builds upon a recently proposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolutional and fully connected layers for learning characteristic spectral-temporal patterns.

...read moreread less

Abstract: Predominant instrument recognition in ensemble recordings remains a challenging task, particularly if closelyrelated instruments such as alto and tenor saxophone need to be distinguished. In this paper, we build upon a recentlyproposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolutional and fully connected layers for learning characteristic spectral-temporal patterns. We systematically evaluate harmonic/percussive and solo/accompaniment source separation algorithms as pre-processing steps to reduce the overlap among multiple instruments prior to the instrument recognition step. For the particular use-case of solo instrument recognition in jazz ensemble recordings, we further apply transfer learning techniques to fine-tune a previously trained instrument recognition model for classifying six jazz solo instruments. Our results indicate that both source separation as pre-processing step as well as transfer learning clearly improve recognition performance, especially for smaller subsets of highly similar instruments.

...read moreread less

30 citations

Cites background or methods from "A Multiple-Expert Framework for Ins..."

...Considering classification scenarios with more than 10 instrument classes, the best-performing systems achieve recognition rates above 90%, as shown for instance in [14, 27]....
[...]
...ing on note-wise, frame-wise, and envelope-wise features was proposed in [14]....
[...]

Dissertation•

Automatic Transcription of Bass Guitar Tracks applied for Music Genre Classification and Sound Synthesis

[...]

Jakob Dr.-Ing. Abeßer

23 Oct 2014

TL;DR: Evaluationsexperimente anhand zweier neu erstellter Audiodatensatze belegen, dass der vorgestellte Transkriptionsalgorithmus auf einem Datensatz von realistischen Bassgitarrenaufnahmen eine hohere Erkennungsgenauigkeit erreichen kann.

...read moreread less

Abstract: Musiksignale bestehen in der Regel aus einer Uberlagerung mehrerer Einzelinstrumente. Die meisten existierenden Algorithmen zur automatischen Transkription und Analyse von Musikaufnahmen im Forschungsfeld des Music Information Retrieval (MIR) versuchen, semantische Information direkt aus diesen gemischten Signalen zu extrahieren. In den letzten Jahren wurde haufig beobachtet, dass die Leistungsfahigkeit dieser Algorithmen durch die Signaluberlagerungen und den daraus resultierenden Informationsverlust generell limitiert ist. Ein moglicher Losungsansatz besteht darin, mittels Verfahren der Quellentrennung die beteiligten Instrumente vor der Analyse klanglich zu isolieren. Die Leistungsfahigkeit dieser Algorithmen ist zum aktuellen Stand der Technik jedoch nicht immer ausreichend, um eine sehr gute Trennung der Einzelquellen zu ermoglichen. In dieser Arbeit werden daher ausschlieslich isolierte Instrumentalaufnahmen untersucht, die klanglich nicht von anderen Instrumenten uberlagert sind. Exemplarisch werden anhand der elektrischen Bassgitarre auf die Klangerzeugung dieses Instrumentes hin spezialisierte Analyse- und Klangsynthesealgorithmen entwickelt und evaluiert.Im ersten Teil der vorliegenden Arbeit wird ein Algorithmus vorgestellt, der eine automatische Transkription von Bassgitarrenaufnahmen durchfuhrt. Dabei wird das Audiosignal durch verschiedene Klangereignisse beschrieben, welche den gespielten Noten auf dem Instrument entsprechen. Neben den ublichen Notenparametern Anfang, Dauer, Lautstarke und Tonhohe werden dabei auch instrumentenspezifische Parameter wie die verwendeten Spieltechniken sowie die Saiten- und Bundlage auf dem Instrument automatisch extrahiert. Evaluationsexperimente anhand zweier neu erstellter Audiodatensatze belegen, dass der vorgestellte Transkriptionsalgorithmus auf einem Datensatz von realistischen Bassgitarrenaufnahmen eine hohere Erkennungsgenauigkeit erreichen kann als drei existierende Algorithmen aus dem Stand der Technik. Die Schatzung der instrumentenspezifischen Parameter kann insbesondere fur isolierte Einzelnoten mit einer hohen Gute durchgefuhrt werden.Im zweiten Teil der Arbeit wird untersucht, wie aus einer Notendarstellung typischer sich wieder- holender Basslinien auf das Musikgenre geschlossen werden kann. Dabei werden Audiomerkmale extrahiert, welche verschiedene tonale, rhythmische, und strukturelle Eigenschaften von Basslinien quantitativ beschreiben. Mit Hilfe eines neu erstellten Datensatzes von 520 typischen Basslinien aus 13 verschiedenen Musikgenres wurden drei verschiedene Ansatze fur die automatische Genreklassifikation verglichen. Dabei zeigte sich, dass mit Hilfe eines regelbasierten Klassifikationsverfahrens nur Anhand der Analyse der Basslinie eines Musikstuckes bereits eine mittlere Erkennungsrate von 64,8 % erreicht werden konnte.Die Re-synthese der originalen Bassspuren basierend auf den extrahierten Notenparametern wird im dritten Teil der Arbeit untersucht. Dabei wird ein neuer Audiosynthesealgorithmus vorgestellt, der basierend auf dem Prinzip des Physical Modeling verschiedene Aspekte der fur die Bassgitarre charakteristische Klangerzeugung wie Saitenanregung, Dampfung, Kollision zwischen Saite und Bund sowie dem Tonabnehmerverhalten nachbildet. Weiterhin wird ein parametrischerAudiokodierungsansatz diskutiert, der es erlaubt, Bassgitarrenspuren nur anhand der ermittel- ten notenweisen Parameter zu ubertragen um sie auf Dekoderseite wieder zu resynthetisieren. Die Ergebnisse mehrerer Hotest belegen, dass der vorgeschlagene Synthesealgorithmus eine Re- Synthese von Bassgitarrenaufnahmen mit einer besseren Klangqualitat ermoglicht als die Ubertragung der Audiodaten mit existierenden Audiokodierungsverfahren, die auf sehr geringe Bitraten ein gestellt sind.

...read moreread less

10 citations

Cites methods from "A Multiple-Expert Framework for Ins..."

...Instead of using the proposed feature set for the classification of plucking and expression styles, Mel-Frequency Cepstral Coefficients (MFCC) [62] were chosen as audio features since they are widely applied for comparable MIR classification tasks such as instrument recognition [71]....
[...]
...For this purpose, partial tracking algorithms based on detecting spectral peaks and tracking them over time are most-often used in the literature (see for instance [71])....
[...]

Book Chapter•DOI•

Ensemble Size Classification in Colombian Andean String Music Recordings

[...]

Sascha Grollmisch¹, Estefanía Cano, Fernando Mora Ángel², Gustavo Adolfo López Gil²•Institutions (2)

MediaTech Institute¹, University of Antioquia²

14 Oct 2019

TL;DR: In this paper, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts, and different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized.

...read moreread less

Abstract: Reliable methods for automatic retrieval of semantic information from large digital music archives can play a critical role in musicological research and musical heritage preservation. With the advancement of machine learning techniques, new possibilities for information retrieval in scenarios where ground-truth data is scarce are now available. This work investigates the problem of ensemble size classification in music recordings. For this purpose, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts. Different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized. The best deep neural network architecture achieved 81.5% file-wise mean class accuracy using only feed forward layers with linear magnitude spectrograms as input representation. This model will serve as a baseline for future research on ensemble size classification.

...read moreread less

3 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

LIBSVM: A library for support vector machines

[...]

Chih-Chung Chang¹, Chih-Jen Lin¹•Institutions (1)

National Taiwan University¹

06 May 2011-ACM Transactions on Intelligent Systems and Technology

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

40,826 citations

Proceedings Article•DOI•

Comparison of features for musical instrument recognition

[...]

Antti Eronen¹•Institutions (1)

Tampere University of Technology¹

21 Oct 2001

TL;DR: Several features were compared with regard to recognition performance in a musical instrument recognition system and the confusions made by the system were analysed and compared to results reported in a human perception experiment.

...read moreread less

Abstract: Several features were compared with regard to recognition performance in a musical instrument recognition system. Both mel-frequency and linear prediction cepstral and delta cepstral coefficients were calculated. Linear prediction analysis was carried out both on a uniform and a warped frequency scale, and reflection coefficients were also used as features. The performance of earlier described features relating to the temporal development, modulation properties, brightness, and spectral synchronity of sounds was also analysed. The data base consisted of 5286 acoustic and synthetic solo tones from 29 different Western orchestral instruments, out of which 16 instruments were included in the test set. The best performance for solo tone recognition, 35% for individual instruments and 77% for families, was obtained with a feature set consisting of two sets of mel-frequency cepstral coefficients and a subset of the other analysed features. The confusions made by the system were analysed and compared to results reported in a human perception experiment.

...read moreread less

150 citations

"A Multiple-Expert Framework for Ins..." refers background in this paper

...These concepts regard spectral properties of fixed-length segments [3], or take into account the temporal properties of isolated notes [5]....
[...]

Journal Article•DOI•

Feature dependence in the automatic identification of musical woodwind instruments.

[...]

Judith C. Brown¹, Olivier Houix, Stephen McAdams•Institutions (1)

Wellesley College¹

27 Feb 2001-Journal of the Acoustical Society of America

TL;DR: Comparison to a human perception experiment with sounds produced by the same instruments indicates that, under these conditions, computers do as well as humans in identifying woodwind instruments.

...read moreread less

Abstract: The automatic identification of musical instruments is a relatively unexplored and potentially very important field for its promise to free humans from time-consuming searches on the Internet and indexing of audio material. Speaker identification techniques have been used in this paper to determine the properties (features) which are most effective in identifying a statistically significant number of sounds representing four classes of musical instruments (oboe, sax, clarinet, flute) excerpted from actual performances. Features examined include cepstral coefficients, constant-Q coefficients, spectral centroid, autocorrelation coefficients, and moments of the time wave. The number of these coefficients was varied, and in the case of cepstral coefficients, ten coefficients were sufficient for identification. Correct identifications of 79%-84% were obtained with cepstral coefficients, bin-to-bin differences of the constant-Q coefficients, and autocorrelation coefficients; the latter have not been used previously in either speaker or instrument identification work. These results depended on the training sounds chosen and the number of clusters used in the calculation. Comparison to a human perception experiment with sounds produced by the same instruments indicates that, under these conditions, computers do as well as humans in identifying woodwind instruments.

...read moreread less

143 citations

"A Multiple-Expert Framework for Ins..." refers background in this paper

...These concepts regard spectral properties of fixed-length segments [3], or take into account the temporal properties of isolated notes [5]....
[...]

Journal Article•DOI•

Temporal Integration for Audio Classification With Application to Musical Instrument Classification

[...]

C. Joder¹, Slim Essid¹, Gael Richard¹•Institutions (1)

ParisTech¹

01 Jan 2009-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A number of methods for early and late temporal integration are proposed and an in-depth experimental study on their interest for the task of musical instrument recognition on solo musical phrases is provided.

...read moreread less

Abstract: Nowadays, it appears essential to design automatic indexing tools which provide meaningful and efficient means to describe the musical audio content. There is in fact a growing interest for music information retrieval (MIR) applications amongst which the most popular are related to music similarity retrieval, artist identification, musical genre or instrument recognition. Current MIR-related classification systems usually do not take into account the mid-term temporal properties of the signal (over several frames) and lie on the assumption that the observations of the features in different frames are statistically independent. The aim of this paper is to demonstrate the usefulness of the information carried by the evolution of these characteristics over time. To that purpose, we propose a number of methods for early and late temporal integration and provide an in-depth experimental study on their interest for the task of musical instrument recognition on solo musical phrases. In particular, the impact of the time horizon over which the temporal integration is performed will be assessed both for fixed and variable frame length analysis. Also, a number of proposed alignment kernels will be used for late temporal integration. For all experiments, the results are compared to a state of the art musical instrument recognition system.

...read moreread less

129 citations

"A Multiple-Expert Framework for Ins..." refers background in this paper

...Later contributions cover aspects such as the pitch dependency of timbre [13] or the temporal integration of spectral features over time [11]....
[...]

Proceedings Article•

A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals

[...]

Juan J. Bosch¹, Jordi Janer², Ferdinand Fuhrmann², Perfecto Herrera²•Institutions (2)

Yamaha Corporation¹, Pompeu Fabra University²

01 Jan 2012

TL;DR: The authors address the identification of predominant music instruments in polytimbral audio by previously dividing the original signal into several streams, and show that the performance was only enhanced if the recognition models are trained with the features extracted from the separated audio streams.

...read moreread less

Abstract: The authors address the identification of predominant music instruments in polytimbral audio by previously dividing the original signal into several streams Several strategies are evaluated, ranging from low to high complexity with respect to the segregation algorithm and models used for classification The dataset of interest is built from professionally produced recordings, which typically pose problems to state-of-art source separation algorithms The recognition results are improved a 19% with a simple sound segregation pre-step using only panning information, in comparison to the original algorithm In order to further improve the results, we evaluated the use of a complex source separation as a pre-step The results showed that the performance was only enhanced if the recognition models are trained with the features extracted from the separated audio streams In this way, the typical errors of state-of-art separation algorithms are acknowledged, and the performance of the original instrument recognition algorithm is improved in up to 32%

...read moreread less

92 citations