scispace - formally typeset
Search or ask a question
Author

C. Duxbury

Bio: C. Duxbury is an academic researcher from University of London. The author has contributed to research in topics: Signal & Detection theory. The author has an hindex of 1, co-authored 1 publications receiving 752 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Methods based on the use of explicitly predefined signal features: the signal's amplitude envelope, spectral magnitudes and phases, time-frequency representations, and methods based on probabilistic signal models are discussed.
Abstract: Note onset detection and localization is useful in a number of analysis and indexing techniques for musical signals. The usual way to detect onsets is to look for "transient" regions in the signal, a notion that leads to many definitions: a sudden burst of energy, a change in the short-time spectrum of the signal or in the statistical properties, etc. The goal of this paper is to review, categorize, and compare some of the most commonly used techniques for onset detection, and to present possible enhancements. We discuss methods based on the use of explicitly predefined signal features: the signal's amplitude envelope, spectral magnitudes and phases, time-frequency representations; and methods based on probabilistic signal models: model-based change point detection, surprise signals, etc. Using a choice of test cases, we provide some guidelines for choosing the appropriate method for a given application.

802 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Les approches utilisees pour the recherche d'information musicale (RIM) sont multidisciplinaires : bibliotheconomie and science de l'information, musicologie, theorie musicale, ingenierie du son, informatique, droit et commerce...
Abstract: Les approches utilisees pour la recherche d'information musicale (RIM) sont multidisciplinaires : bibliotheconomie et science de l'information, musicologie, theorie musicale, ingenierie du son, informatique, droit et commerce... L'article vise a identifier et a expliquer la problematique de de la RIM alors qu'elle devient une discipline a part entiere, les influences historiques, l'etat-de-l'art de la recherche et les solutions potentielles. L'information musicale est multifacette - ton, temporalite, harmonie, timbre, edition, texte et bibliographie - , l'acces a chacune de ces facettes constituant un defi pour la recherche et le developpement. Mais la RIM represente egalement un defi multirepresentationnel, multiculturel, multiexperience et multidisciplinaire. Les systemes de RIM deploient differents degres d'exhaustivite representationnelle. Ils relevent generalement de deux types : les systemes analytiques ou de production, et les systemes de localisation. En conclusion, l'A. mentionne quelques ateliers et symposiums recents ainsi que les principaux projets de recherche concernant la RIM.

372 citations

Journal ArticleDOI
01 Dec 2013
TL;DR: Limits of current transcription methods are analyzed and promising directions for future research are identified, including the integration of information from multiple algorithms and different musical aspects.
Abstract: Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

298 citations

Journal ArticleDOI
TL;DR: An algorithm that predicts musical genre and artist from an audio waveform using the ensemble learner ADABOOST and evidence collected from a variety of popular features and classifiers that the technique of classifying features aggregated over segments of audio is better than classifying either entire songs or individual short-timescale features.
Abstract: We present an algorithm that predicts musical genre and artist from an audio waveform. Our method uses the ensemble learner ADABOOST to select from a set of audio features that have been extracted from segmented audio and then aggregated. Our classifier proved to be the most effective method for genre classification at the recent MIREX 2005 international contests in music information extraction, and the second-best method for recognizing artists. This paper describes our method in detail, from feature extraction to song classification, and presents an evaluation of our method on three genre databases and two artist-recognition databases. Furthermore, we present evidence collected from a variety of popular features and classifiers that the technique of classifying features aggregated over segments of audio is better than classifying either entire songs or individual short-timescale features.

296 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: This paper presents an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick, using a recurrent neural network to predict sound features from videos and then producing a waveform from these features with an example-based synthesis procedure.
Abstract: Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We show that the sounds predicted by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about material properties and physical interactions.

284 citations

Proceedings ArticleDOI
27 Oct 2006
TL;DR: Initial experiments show that the algorithm can successfully detect harmonic changes such as chord boundaries in polyphonic audio recordings.
Abstract: We propose a novel method for detecting changes in the harmonic content of musical audio signals. Our method uses a new model for Equal Tempered Pitch Class Space. This model maps 12-bin chroma vectors to the interior space of a 6-D polytope; pitch classes are mapped onto the vertices of this polytope. Close harmonic relations such as fifths and thirds appear as small Euclidian distances. We calculate the Euclidian distance between analysis frames n +1 and n -1 to develop a harmonic change measure for frame n. A peak in the detection function denotes a transition from one harmonically stable region to another. Initial experiments show that the algorithm can successfully detect harmonic changes such as chord boundaries in polyphonic audio recordings.

266 citations