scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A missing feature approach to instrument identification in polyphonic music

06 Apr 2003-Vol. 5, pp 49
TL;DR: This work incorporates ideas from missing feature theory into a GMM classifier, and frequency regions that are dominated by energy from an interfering tone are marked as unreliable and excluded from the classification process.
Abstract: Summary form only given. Gaussian mixture model (GMM) classifiers have been shown to give good instrument recognition performance for monophonic music played by a single instrument. However, many applications (such as automatic music transcription) require instrument identification from polyphonic, multi-instrumental recordings. We address this problem by incorporating ideas from missing feature theory into a GMM classifier. Specifically, frequency regions that are dominated by energy from an interfering tone are marked as unreliable and excluded from the classification process. This approach has been evaluated on random two-tone chords and an excerpt from a commercially available compact disc, with promising results.
Citations
More filters
Journal ArticleDOI
01 Dec 2013
TL;DR: Limits of current transcription methods are analyzed and promising directions for future research are identified, including the integration of information from multiple algorithms and different musical aspects.
Abstract: Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

298 citations


Cites background from "A missing feature approach to instr..."

  • ...A series of approaches incorporate missing feature theory and aim to generate time-frequency masks that indicate spectrotemporal regions that belong only to a particular instrument which can then be classified more accurately since regions that are corrupted by noise or interference are kept out of the classification process [42, 53]....

    [...]

Journal ArticleDOI
TL;DR: It is demonstrated that, to be successful, music audio signal processing techniques must be informed by a deep and thorough insight into the nature of music itself.
Abstract: Music signal processing may appear to be the junior relation of the large and mature field of speech signal processing, not least because many techniques and representations originally developed for speech have been applied to music, often with good results. However, music signals possess specific acoustic and structural characteristics that distinguish them from spoken language or other nonmusical signals. This paper provides an overview of some signal analysis techniques that specifically address musical dimensions such as melody, harmony, rhythm, and timbre. We will examine how particular characteristics of music signals impact and determine these techniques, and we highlight a number of novel music analysis and retrieval tasks that such processing makes possible. Our goal is to demonstrate that, to be successful, music audio signal processing techniques must be informed by a deep and thorough insight into the nature of music itself.

246 citations

Book
14 Aug 2012
TL;DR: This book provides quick access to different analysis algorithms and allows comparison between different approaches to the same task, making it useful for newcomers to audio signal processing and industry experts alike.
Abstract: With the proliferation of digital audio distribution over digital media, audio content analysis is fast becoming a requirement for designers of intelligent signal-adaptive audio processing systems. Written by a well-known expert in the field, this book provides quick access to different analysis algorithms and allows comparison between different approaches to the same task, making it useful for newcomers to audio signal processing and industry experts alike. A review of relevant fundamentals in audio signal processing, psychoacoustics, and music theory, as well as downloadable MATLAB files are also included. Please visit the companion website: www.AudioContentAnalysis.org

184 citations

Journal ArticleDOI
TL;DR: This study focuses on a single music genre but combines a variety of instruments among which are percussion and singing voice, and obtains a taxonomy of musical ensembles which is used to efficiently classify possible combinations of instruments played simultaneously.
Abstract: We propose a new approach to instrument recognition in the context of real music orchestrations ranging from solos to quartets. The strength of our approach is that it does not require prior musical source separation. Thanks to a hierarchical clustering algorithm exploiting robust probabilistic distances, we obtain a taxonomy of musical ensembles which is used to efficiently classify possible combinations of instruments played simultaneously. Moreover, a wide set of acoustic features is studied including some new proposals. In particular, signal to mask ratios are found to be useful features for audio classification. This study focuses on a single music genre (i.e., jazz) but combines a variety of instruments among which are percussion and singing voice. Using a varied database of sound excerpts from commercial recordings, we show that the segmentation of music with respect to the instruments played can be achieved with an average accuracy of 53%.

133 citations


Cites background from "A missing feature approach to instr..."

  • ...The success of this task is then intimately connected to the efficiency of the extraction of multiple fundamental frequencies, which is known to be a very difficult problem, especially for octave-related notes....

    [...]

Journal ArticleDOI
TL;DR: It is shown that higher recognition rates can be reached with pairwise optimized subsets of features in association with SVM classification using a radial basis function kernel.
Abstract: Musical instrument recognition is an important aspect of music information retrieval. In this paper, statistical pattern recognition techniques are utilized to tackle the problem in the context of solo musical phrases. Ten instrument classes from different instrument families are considered. A large sound database is collected from excerpts of musical phrases acquired from commercial recordings translating different instrument instances, performers, and recording conditions. More than 150 signal processing features are studied including new descriptors. Two feature selection techniques, inertia ratio maximization with feature space projection and genetic algorithms are considered in a class pairwise manner whereby the most relevant features are fetched for each instrument pair. For the classification task, experimental results are provided using Gaussian mixture models (GMMs) and support vector machines (SVMs). It is shown that higher recognition rates can be reached with pairwise optimized subsets of features in association with SVM classification using a radial basis function kernel

111 citations


Cites background from "A missing feature approach to instr..."

  • ...However, identifying instruments from complex mixtures involving more than one playing at a time remains a very difficult problem that has been addressed in a very few studies [2], [3], [4], [5], [6] with often important restrictions regarding the musical content with respect to instruments involved and played notes....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: An approach to robust ASR which acknowledges the fact that some spectro-temporal regions will be dominated by noise, and introduces two approaches for dealing with unreliable evidence, including marginalisation and state-based data imputation.

704 citations


"A missing feature approach to instr..." refers background in this paper

  • ...This idea is motivated by a model of auditory perception which proposes that listeners are able to recognise partially masked sounds from an incomplete acoustic representation [2]....

    [...]

Dissertation
01 Jan 1999
TL;DR: A computer model of the recognition process is developed that is capable of “listening” to a recording of a musical instrument and classifying the instrument as one of 25 possibilities, based on current models of signal processing in the human auditory system.
Abstract: The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about objects in the environment or to predict their behavior. In order to explore the process, attention is restricted to isolated sounds produced by a small class of sound sources, the non-percussive orchestral musical instruments. Previous research on the perception and production of orchestral instrument sounds is reviewed from a vantage point based on the excitation and resonance structure of the sound-production process, revealing a set of perceptually salient acoustic features. A computer model of the recognition process is developed that is capable of “listening” to a recording of a musical instrument and classifying the instrument as one of 25 possibilities. The model is based on current models of signal processing in the human auditory system. It explicitly extracts salient acoustic features and uses a novel improvisational taxonomic architecture (based on simple statistical pattern-recognition techniques) to classify the sound source. The performance of the model is compared directly to that of skilled human listeners, using

206 citations


"A missing feature approach to instr..." refers methods in this paper

  • ...First, only frame-based spectral features are used in our system, rather than features that encode the temporal evolution at the onset of tones (e.g., see [5], [7])....

    [...]

01 Jan 1983

146 citations


"A missing feature approach to instr..." refers methods in this paper

  • ...The F0 analysis used here is based on the notion of 'harmonic sieves' [9]....

    [...]

Journal ArticleDOI
TL;DR: Comparison to a human perception experiment with sounds produced by the same instruments indicates that, under these conditions, computers do as well as humans in identifying woodwind instruments.
Abstract: The automatic identification of musical instruments is a relatively unexplored and potentially very important field for its promise to free humans from time-consuming searches on the Internet and indexing of audio material. Speaker identification techniques have been used in this paper to determine the properties (features) which are most effective in identifying a statistically significant number of sounds representing four classes of musical instruments (oboe, sax, clarinet, flute) excerpted from actual performances. Features examined include cepstral coefficients, constant-Q coefficients, spectral centroid, autocorrelation coefficients, and moments of the time wave. The number of these coefficients was varied, and in the case of cepstral coefficients, ten coefficients were sufficient for identification. Correct identifications of 79%-84% were obtained with cepstral coefficients, bin-to-bin differences of the constant-Q coefficients, and autocorrelation coefficients; the latter have not been used previously in either speaker or instrument identification work. These results depended on the training sounds chosen and the number of clusters used in the calculation. Comparison to a human perception experiment with sounds produced by the same instruments indicates that, under these conditions, computers do as well as humans in identifying woodwind instruments.

143 citations


"A missing feature approach to instr..." refers methods in this paper

  • ...Here, we assume a diagonal covariance matrix; although this embodies an assumption which is incorrect (independence of features) it is a widely used simplification (e.g., see [1])....

    [...]

  • ...Secondly, although cepstral coefficients have been used successfully as features for instrument identification [1], [6], they are not used here because they do not fit naturally into the missing feature approach....

    [...]

01 Jan 1999
TL;DR: The Cambridge Research Laboratory was founded in 1987 to advance the state of the art in both core computing and human-computer interaction, and to use the knowledge gained to support the Company's corporate objectives through interconnected pursuits in technology creation, advanced systems engineering, and business development.
Abstract: The Cambridge Research Laboratory was founded in 1987 to advance the state of the art in both core computing and human-computer interaction, and to use the knowledge so gained to support the Company's corporate objectives. We believe this is best accomplished through interconnected pursuits in technology creation, advanced systems engineering, and business development. We are multimedia data. We recognize and embrace a technology creation model which is characterized by three major phases: Freedom: The lifeblood of the Laboratory comes from the observations and imaginations of our research staff. It is here that challenging research problems are uncovered (through discussions with customers, through interactions with others in the Corporation, through other professional interactions, through reading, and the like) or that new ideas are born. For any such problem or idea, this phase culminates in the nucleation of a project team around a well-articulated central research question and the outlining of a research plan. Focus: Once a team is formed, we aggressively pursue the creation of new technology based on the plan. This may involve direct collaboration with other technical professionals inside and outside the Corporation. This phase culminates in the demonstrable creation of new technology which may take any of a number of forms—a journal article, a technical talk, a working prototype, a patent application, or some combination of these. The research team is typically augmented with other resident professionals—engineering and business development—who work as integral members of the core team to prepare preliminary plans for how best to leverage this new knowledge, either through internal transfer of technology or through other means. Follow-through: We actively pursue taking the best technologies to the marketplace. For those opportunities which are not immediately transferred internally and where the team has identified a significant opportunity, the business development and engineering staff will lead early-stage commercial development, often in conjunction with members of the research staff. While the value to the Corporation of taking these new ideas to the market is clear, it also has a significant positive impact on our future research work by providing the means to understand intimately the problems and opportunities in the market and to more fully exercise our ideas and concepts in real-world settings. Throughout this process, communicating our understanding is a critical part of what we do, and participating in the larger technical community—through the publication of refereed journal articles and the presentation of our ideas at conferences—is …

110 citations


"A missing feature approach to instr..." refers methods in this paper

  • ...Secondly, although cepstral coefficients have been used successfully as features for instrument identification [1], [6], they are not used here because they do not fit naturally into the missing feature approach....

    [...]