scispace - formally typeset
Search or ask a question
Book ChapterDOI

A Multiple-Expert Framework for Instrument Recognition

TL;DR: A new approach towards feature-based instrument recognition is presented that makes use of redundancies in the harmonic structure and temporal development of a note that is targeted at transferability towards use on polyphonic material.
Abstract: Instrument recognition is an important task in music information retrieval (MIR). Whereas the recognition of musical instruments in monophonic recordings has been studied widely, the polyphonic case still is far from being solved. A new approach towards feature-based instrument recognition is presented that makes use of redundancies in the harmonic structure and temporal development of a note. The structure of the proposed method is targeted at transferability towards use on polyphonic material. Multiple feature categories are extracted and classified separately with SVM models. In a further step, class probabilities are aggregated in a two-step combination scheme. The presented system was evaluated on a dataset of 3300 isolated single notes. Different aggregation methods are compared. As the results of the joined classification outperform individual categories, further development of the presented technique is motivated.
Citations
More filters
Proceedings Article
01 Jan 2018
TL;DR: This paper builds upon a recently proposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolutional and fully connected layers for learning characteristic spectral-temporal patterns.
Abstract: Predominant instrument recognition in ensemble recordings remains a challenging task, particularly if closelyrelated instruments such as alto and tenor saxophone need to be distinguished. In this paper, we build upon a recentlyproposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolutional and fully connected layers for learning characteristic spectral-temporal patterns. We systematically evaluate harmonic/percussive and solo/accompaniment source separation algorithms as pre-processing steps to reduce the overlap among multiple instruments prior to the instrument recognition step. For the particular use-case of solo instrument recognition in jazz ensemble recordings, we further apply transfer learning techniques to fine-tune a previously trained instrument recognition model for classifying six jazz solo instruments. Our results indicate that both source separation as pre-processing step as well as transfer learning clearly improve recognition performance, especially for smaller subsets of highly similar instruments.

30 citations


Cites background or methods from "A Multiple-Expert Framework for Ins..."

  • ...Considering classification scenarios with more than 10 instrument classes, the best-performing systems achieve recognition rates above 90%, as shown for instance in [14, 27]....

    [...]

  • ...ing on note-wise, frame-wise, and envelope-wise features was proposed in [14]....

    [...]

Dissertation
23 Oct 2014
TL;DR: Evaluationsexperimente anhand zweier neu erstellter Audiodatensatze belegen, dass der vorgestellte Transkriptionsalgorithmus auf einem Datensatz von realistischen Bassgitarrenaufnahmen eine hohere Erkennungsgenauigkeit erreichen kann.
Abstract: Musiksignale bestehen in der Regel aus einer Uberlagerung mehrerer Einzelinstrumente. Die meisten existierenden Algorithmen zur automatischen Transkription und Analyse von Musikaufnahmen im Forschungsfeld des Music Information Retrieval (MIR) versuchen, semantische Information direkt aus diesen gemischten Signalen zu extrahieren. In den letzten Jahren wurde haufig beobachtet, dass die Leistungsfahigkeit dieser Algorithmen durch die Signaluberlagerungen und den daraus resultierenden Informationsverlust generell limitiert ist. Ein moglicher Losungsansatz besteht darin, mittels Verfahren der Quellentrennung die beteiligten Instrumente vor der Analyse klanglich zu isolieren. Die Leistungsfahigkeit dieser Algorithmen ist zum aktuellen Stand der Technik jedoch nicht immer ausreichend, um eine sehr gute Trennung der Einzelquellen zu ermoglichen. In dieser Arbeit werden daher ausschlieslich isolierte Instrumentalaufnahmen untersucht, die klanglich nicht von anderen Instrumenten uberlagert sind. Exemplarisch werden anhand der elektrischen Bassgitarre auf die Klangerzeugung dieses Instrumentes hin spezialisierte Analyse- und Klangsynthesealgorithmen entwickelt und evaluiert.Im ersten Teil der vorliegenden Arbeit wird ein Algorithmus vorgestellt, der eine automatische Transkription von Bassgitarrenaufnahmen durchfuhrt. Dabei wird das Audiosignal durch verschiedene Klangereignisse beschrieben, welche den gespielten Noten auf dem Instrument entsprechen. Neben den ublichen Notenparametern Anfang, Dauer, Lautstarke und Tonhohe werden dabei auch instrumentenspezifische Parameter wie die verwendeten Spieltechniken sowie die Saiten- und Bundlage auf dem Instrument automatisch extrahiert. Evaluationsexperimente anhand zweier neu erstellter Audiodatensatze belegen, dass der vorgestellte Transkriptionsalgorithmus auf einem Datensatz von realistischen Bassgitarrenaufnahmen eine hohere Erkennungsgenauigkeit erreichen kann als drei existierende Algorithmen aus dem Stand der Technik. Die Schatzung der instrumentenspezifischen Parameter kann insbesondere fur isolierte Einzelnoten mit einer hohen Gute durchgefuhrt werden.Im zweiten Teil der Arbeit wird untersucht, wie aus einer Notendarstellung typischer sich wieder- holender Basslinien auf das Musikgenre geschlossen werden kann. Dabei werden Audiomerkmale extrahiert, welche verschiedene tonale, rhythmische, und strukturelle Eigenschaften von Basslinien quantitativ beschreiben. Mit Hilfe eines neu erstellten Datensatzes von 520 typischen Basslinien aus 13 verschiedenen Musikgenres wurden drei verschiedene Ansatze fur die automatische Genreklassifikation verglichen. Dabei zeigte sich, dass mit Hilfe eines regelbasierten Klassifikationsverfahrens nur Anhand der Analyse der Basslinie eines Musikstuckes bereits eine mittlere Erkennungsrate von 64,8 % erreicht werden konnte.Die Re-synthese der originalen Bassspuren basierend auf den extrahierten Notenparametern wird im dritten Teil der Arbeit untersucht. Dabei wird ein neuer Audiosynthesealgorithmus vorgestellt, der basierend auf dem Prinzip des Physical Modeling verschiedene Aspekte der fur die Bassgitarre charakteristische Klangerzeugung wie Saitenanregung, Dampfung, Kollision zwischen Saite und Bund sowie dem Tonabnehmerverhalten nachbildet. Weiterhin wird ein parametrischerAudiokodierungsansatz diskutiert, der es erlaubt, Bassgitarrenspuren nur anhand der ermittel- ten notenweisen Parameter zu ubertragen um sie auf Dekoderseite wieder zu resynthetisieren. Die Ergebnisse mehrerer Hotest belegen, dass der vorgeschlagene Synthesealgorithmus eine Re- Synthese von Bassgitarrenaufnahmen mit einer besseren Klangqualitat ermoglicht als die Ubertragung der Audiodaten mit existierenden Audiokodierungsverfahren, die auf sehr geringe Bitraten ein gestellt sind.

10 citations


Cites methods from "A Multiple-Expert Framework for Ins..."

  • ...Instead of using the proposed feature set for the classification of plucking and expression styles, Mel-Frequency Cepstral Coefficients (MFCC) [62] were chosen as audio features since they are widely applied for comparable MIR classification tasks such as instrument recognition [71]....

    [...]

  • ...For this purpose, partial tracking algorithms based on detecting spectral peaks and tracking them over time are most-often used in the literature (see for instance [71])....

    [...]

Book ChapterDOI
14 Oct 2019
TL;DR: In this paper, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts, and different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized.
Abstract: Reliable methods for automatic retrieval of semantic information from large digital music archives can play a critical role in musicological research and musical heritage preservation. With the advancement of machine learning techniques, new possibilities for information retrieval in scenarios where ground-truth data is scarce are now available. This work investigates the problem of ensemble size classification in music recordings. For this purpose, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts. Different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized. The best deep neural network architecture achieved 81.5% file-wise mean class accuracy using only feed forward layers with linear magnitude spectrograms as input representation. This model will serve as a baseline for future research on ensemble size classification.

3 citations

References
More filters
Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations

Proceedings ArticleDOI
21 Oct 2001
TL;DR: Several features were compared with regard to recognition performance in a musical instrument recognition system and the confusions made by the system were analysed and compared to results reported in a human perception experiment.
Abstract: Several features were compared with regard to recognition performance in a musical instrument recognition system. Both mel-frequency and linear prediction cepstral and delta cepstral coefficients were calculated. Linear prediction analysis was carried out both on a uniform and a warped frequency scale, and reflection coefficients were also used as features. The performance of earlier described features relating to the temporal development, modulation properties, brightness, and spectral synchronity of sounds was also analysed. The data base consisted of 5286 acoustic and synthetic solo tones from 29 different Western orchestral instruments, out of which 16 instruments were included in the test set. The best performance for solo tone recognition, 35% for individual instruments and 77% for families, was obtained with a feature set consisting of two sets of mel-frequency cepstral coefficients and a subset of the other analysed features. The confusions made by the system were analysed and compared to results reported in a human perception experiment.

150 citations


"A Multiple-Expert Framework for Ins..." refers background in this paper

  • ...These concepts regard spectral properties of fixed-length segments [3], or take into account the temporal properties of isolated notes [5]....

    [...]

Journal ArticleDOI
TL;DR: Comparison to a human perception experiment with sounds produced by the same instruments indicates that, under these conditions, computers do as well as humans in identifying woodwind instruments.
Abstract: The automatic identification of musical instruments is a relatively unexplored and potentially very important field for its promise to free humans from time-consuming searches on the Internet and indexing of audio material. Speaker identification techniques have been used in this paper to determine the properties (features) which are most effective in identifying a statistically significant number of sounds representing four classes of musical instruments (oboe, sax, clarinet, flute) excerpted from actual performances. Features examined include cepstral coefficients, constant-Q coefficients, spectral centroid, autocorrelation coefficients, and moments of the time wave. The number of these coefficients was varied, and in the case of cepstral coefficients, ten coefficients were sufficient for identification. Correct identifications of 79%-84% were obtained with cepstral coefficients, bin-to-bin differences of the constant-Q coefficients, and autocorrelation coefficients; the latter have not been used previously in either speaker or instrument identification work. These results depended on the training sounds chosen and the number of clusters used in the calculation. Comparison to a human perception experiment with sounds produced by the same instruments indicates that, under these conditions, computers do as well as humans in identifying woodwind instruments.

143 citations


"A Multiple-Expert Framework for Ins..." refers background in this paper

  • ...These concepts regard spectral properties of fixed-length segments [3], or take into account the temporal properties of isolated notes [5]....

    [...]

Journal ArticleDOI
C. Joder1, Slim Essid1, Gael Richard1
TL;DR: A number of methods for early and late temporal integration are proposed and an in-depth experimental study on their interest for the task of musical instrument recognition on solo musical phrases is provided.
Abstract: Nowadays, it appears essential to design automatic indexing tools which provide meaningful and efficient means to describe the musical audio content. There is in fact a growing interest for music information retrieval (MIR) applications amongst which the most popular are related to music similarity retrieval, artist identification, musical genre or instrument recognition. Current MIR-related classification systems usually do not take into account the mid-term temporal properties of the signal (over several frames) and lie on the assumption that the observations of the features in different frames are statistically independent. The aim of this paper is to demonstrate the usefulness of the information carried by the evolution of these characteristics over time. To that purpose, we propose a number of methods for early and late temporal integration and provide an in-depth experimental study on their interest for the task of musical instrument recognition on solo musical phrases. In particular, the impact of the time horizon over which the temporal integration is performed will be assessed both for fixed and variable frame length analysis. Also, a number of proposed alignment kernels will be used for late temporal integration. For all experiments, the results are compared to a state of the art musical instrument recognition system.

129 citations


"A Multiple-Expert Framework for Ins..." refers background in this paper

  • ...Later contributions cover aspects such as the pitch dependency of timbre [13] or the temporal integration of spectral features over time [11]....

    [...]

Proceedings Article
01 Jan 2012
TL;DR: The authors address the identification of predominant music instruments in polytimbral audio by previously dividing the original signal into several streams, and show that the performance was only enhanced if the recognition models are trained with the features extracted from the separated audio streams.
Abstract: The authors address the identification of predominant music instruments in polytimbral audio by previously dividing the original signal into several streams Several strategies are evaluated, ranging from low to high complexity with respect to the segregation algorithm and models used for classification The dataset of interest is built from professionally produced recordings, which typically pose problems to state-of-art source separation algorithms The recognition results are improved a 19% with a simple sound segregation pre-step using only panning information, in comparison to the original algorithm In order to further improve the results, we evaluated the use of a complex source separation as a pre-step The results showed that the performance was only enhanced if the recognition models are trained with the features extracted from the separated audio streams In this way, the typical errors of state-of-art separation algorithms are acknowledged, and the performance of the original instrument recognition algorithm is improved in up to 32%

92 citations


"A Multiple-Expert Framework for Ins..." refers methods in this paper

  • ...Also the application of source separation techniques has been beneficially applied in this context [2]....

    [...]