scispace - formally typeset
Search or ask a question
Author

Juan José Burred

Bio: Juan José Burred is an academic researcher from IRCAM. The author has contributed to research in topics: Source separation & Audio signal processing. The author has an hindex of 12, co-authored 30 publications receiving 445 citations. Previous affiliations of Juan José Burred include Technical University of Berlin & Free University of Berlin.

Papers
More filters
Journal Article
TL;DR: The design, implementation, and evaluation of a system for automatic audio signal classification is presented, differentiating between three speech classes, 13 musical genres, and background noise according to audio type.
Abstract: The design, implementation, and evaluation of a system for automatic audio signal classification is presented. The signals are classified according to audio type, differentiating between three speech classes, 13 musical genres, and background noise. A large number of audio features are evaluated for their suitability in such a classification task, including MPEG-7 descriptors and several new features. The selection of the features is carried out systematically with regard to their robustness to noise and bandwidth changes, as well as to their ability to distinguish a given set of audio types. Direct and hierarchical approaches for the feature selection and for the classification are evaluated and compared.

76 citations

Journal ArticleDOI
TL;DR: An experimental paradigm that combines state-of-the-art voice transformation algorithms with psychophysical reverse correlation is introduced and shows that two of the most important dimensions of social judgments, a speaker’s perceived dominance and trustworthiness, are driven by robust and distinguishing pitch trajectories in short utterances like the word “Hello.”
Abstract: Human listeners excel at forming high-level social representations about each other, even from the briefest of utterances. In particular, pitch is widely recognized as the auditory dimension that conveys most of the information about a speaker’s traits, emotional states, and attitudes. While past research has primarily looked at the influence of mean pitch, almost nothing is known about how intonation patterns, i.e., finely tuned pitch trajectories around the mean, may determine social judgments in speech. Here, we introduce an experimental paradigm that combines state-of-the-art voice transformation algorithms with psychophysical reverse correlation and show that two of the most important dimensions of social judgments, a speaker’s perceived dominance and trustworthiness, are driven by robust and distinguishing pitch trajectories in short utterances like the word “Hello,” which remained remarkably stable whether male or female listeners judged male or female speakers. These findings reveal a unique communicative adaptation that enables listeners to infer social traits regardless of speakers’ physical characteristics, such as sex and mean pitch. By characterizing how any given individual’s mental representations may differ from this generic code, the method introduced here opens avenues to explore dysprosody and social-cognitive deficits in disorders like autism spectrum and schizophrenia. In addition, once derived experimentally, these prototypes can be applied to novel utterances, thus providing a principled way to modulate personality impressions in arbitrary speech signals.

56 citations

Journal ArticleDOI
TL;DR: A computational model of musical instrument sounds that focuses on capturing the dynamic behavior of the spectral envelope, which results in a compact representation in the form of a set of prototype curves in feature space, or equivalently of prototype spectro-temporal envelopes in the time-frequency domain.
Abstract: We present a computational model of musical instrument sounds that focuses on capturing the dynamic behavior of the spectral envelope. A set of spectro-temporal envelopes belonging to different notes of each instrument are extracted by means of sinusoidal modeling and subsequent frequency interpolation, before being subjected to principal component analysis. The prototypical evolution of the envelopes in the obtained reduced-dimensional space is modeled as a nonstationary Gaussian Process. This results in a compact representation in the form of a set of prototype curves in feature space, or equivalently of prototype spectro-temporal envelopes in the time-frequency domain. Finally, the obtained models are successfully evaluated in the context of two music content analysis tasks: classification of instrument samples and detection of instruments in monaural polyphonic mixtures.

45 citations

Proceedings Article
01 Jan 2007
TL;DR: This paper proposes a framework for the sound source separation and timbre classification of polyphonic, multi-instrumental music signals, inspired by ideas from Computational Auditory Scene Analysis and formulated as a graph partitioning problem.
Abstract: The identification of the instruments playing in a polyphonic music signal is an important and unsolved problem in Music Information Retrieval. In this paper, we propose a framework for the sound source separation and timbre classification of polyphonic, multi-instrumental music signals. The sound source separation method is inspired by ideas from Computational Auditory Scene Analysis and formulated as a graph partitioning problem. It utilizes a sinusoidal analysis front-end and makes use of the normalized cut, applied as a global criterion for segmenting graphs. Timbre models for six musical instruments are used for the classification of the resulting sound sources. The proposed framework is evaluated on a dataset consisting of mixtures of a variable number of simultaneous pitches and instruments, up to a maximum of four concurrent notes.

37 citations

Proceedings ArticleDOI
25 Mar 2012
TL;DR: A novel application of genetic motif discovery in symbolic sequence representations of sound for audio event detection in an unsupervised and query less manner and can be interpreted as statistical temporal models of spectral evolution.
Abstract: We introduce a novel application of genetic motif discovery in symbolic sequence representations of sound for audio event detection. Sounds are represented as a set of parallel symbolic sequences, each symbol representing a spectral shape, and each layer indicating the contribution weights of each spectral shape to the sound. Such layered symbolic representations are input to a genetic motif discovery algorithm that detects and clusters recurrent and structurally salient sound events in an unsupervised and query less manner. The found motifs can be interpreted as statistical temporal models of spectral evolution. The system is successfully evaluated in two tasks: environmental sound event detection, and drum onset detection.

28 citations


Cited by
More filters
Journal ArticleDOI
01 Oct 1980

1,565 citations

01 Jan 2005
TL;DR: In this article, a general technique called Bubbles is proposed to assign the credit of human categorization performance to specific visual information, such as gender, expressive or not and identity.
Abstract: Everyday, people flexibly perform different categorizations of common faces, objects and scenes. Intuition and scattered evidence suggest that these categorizations require the use of different visual information from the input. However, there is no unifying method, based on the categorization performance of subjects, that can isolate the information used. To this end, we developed Bubbles, a general technique that can assign the credit of human categorization performance to specific visual information. To illustrate the technique, we applied Bubbles on three categorization tasks (gender, expressive or not and identity) on the same set of faces, with human and ideal observers to compare the features they used.

623 citations

Journal ArticleDOI
01 Dec 2013
TL;DR: Limits of current transcription methods are analyzed and promising directions for future research are identified, including the integration of information from multiple algorithms and different musical aspects.
Abstract: Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

298 citations

Journal ArticleDOI
TL;DR: Bregman as discussed by the authors argues that there are two kinds of principle for auditory grouping and segregation: schema-based and primitive, and provides a comprehensive review and interpretation of perceptual experiments up to 1989, so his book pre-dates recent attempts to implement auditory grouping principles in computational models.
Abstract: The world is full of sources of sound. As I write this review, I can hear the humming of the word processor, the creaking of a door in the wind, the distant rumble of an aeroplane, the passage of a car close by, a bird twittering, my neighbour talking on his doorstep, music from his son’s hi-fi, and someone speaking on the radio in the next room. Although each source generates a particular pattern of changes in air-pressure, the changes have summed together by the time they reach my ears, yet I perceive each source distinctly. What principles of perceptual grouping and segregation do listeners use to partition such mixtures of sound? Which principles are applied automatically to all sounds? Which are specialized for particular classes of sound, such as speech? In what ways have the principles been exploited in musical composition? These are the major concerns of this lengthy, scholarly, but readable book. Bregman’s approach is functional not physiological, empirical not computational. He provides a comprehensive review and interpretation of perceptual experiments up to about 1989, so his book pre-dates recent attempts to implement auditory grouping principles in computational models and to find a physiological substrate for them. One important distinction is sustained throughout the book. Bregman argues that there are two kinds of principle for auditory grouping and segregation: “schema-based’’ and “primitive”. Schema-based principles are specific to particular types of source. They are learnt by listeners, and their application is under attentional control. One example may be the use of the knowledge of the timbre of an instrument to follow its part in an ensemble. Another example may be the use of phonetic knowledge to integrate acoustic cues in speech perception. Primitive grouping principles, in contrast, are innate, learnt through evolution. They automatically exploit fundamental physical properties of sounds and sound sources. For example: the sizes of resonators generally change slowly; they often generate energy simultaneously over a wide frequency range; when they vibrate, they create energy at the discrete

273 citations