scispace - formally typeset
Search or ask a question
Author

Xavier Serra

Other affiliations: Yamaha Corporation, IRCAM, Spanish National Research Council  ...read more
Bio: Xavier Serra is an academic researcher from Pompeu Fabra University. The author has contributed to research in topics: Music information retrieval & Convolutional neural network. The author has an hindex of 46, co-authored 366 publications receiving 9263 citations. Previous affiliations of Xavier Serra include Yamaha Corporation & IRCAM.


Papers
More filters
Book Chapter
01 Jan 1997
TL;DR: When generating musical sound on a digital computer, it is important to have a good model whose parameters provide a rich source of meaningful sound transformations.
Abstract: When generating musical sound on a digital computer, it is important to have a good model whose parameters provide a rich source of meaningful sound transformations. Three basic model types are in prevalent use today for musical sound generation: instrument models, spectrum models, and abstract models. Instrument models attempt to parametrize a sound at its source, such as a violin, clarinet, or vocal tract. Spectrum models attempt to parametrize a sound at the basilar membrane of the ear, discarding whatever information the ear seems to discard in the spectrum. Abstract models, such as FM, attempt to provide musically useful parameters in an abstract formula.

390 citations

Proceedings ArticleDOI
15 Apr 2018
TL;DR: The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature.
Abstract: Most speech processing techniques use magnitude spectrograms as front-end and are therefore by default discarding part of the signal: the phase. In order to overcome this limitation’ we propose an end-to-end learning method for speech denoising based on Wavenet. The proposed model adaptation retains Wavenet's powerful acoustic modeling capabilities, while significantly reducing its time-complexity by eliminating its autoregressive nature. Specifically, the model makes use of non-causal, dilated convolutions and predicts target fields instead of a single target sample. The discriminative adaptation of the model we propose, learns in a supervised fashion via minimizing a regression loss. These modifications make the model highly parallelizable during both training and inference. Both quantitative and qualitative evaluations indicate that the proposed method is preferred over Wiener filtering, a common method based on processing the magnitude spectrogram.

387 citations

Proceedings ArticleDOI
04 Nov 2013
TL;DR: Comunicacio presentada a la 14th International Society for Music Information Retrieval Conference, celebrada a Curitiba (Brasil) els dies 4 a 8 de novembre de 2013.
Abstract: Comunicacio presentada a la 14th International Society for Music Information Retrieval Conference, celebrada a Curitiba (Brasil) els dies 4 a 8 de novembre de 2013.

330 citations

Book
01 Jan 1989
TL;DR: This dissertation introduces a new analysis/synthesis method designed to obtain musically useful intermediate representations for sound transformations that is appropriate for the manipulation of sounds.
Abstract: a dissertation submitted to the department of music and the committee on graduate studies of stanford university in partial fulfillment of the requirements for the degree of doctor of philosophy This dissertation introduces a new analysis/synthesis method. It is designed to obtain musically useful intermediate representations for sound transformations. The method's underlying model assumes that a sound is composed of a deterministic component plus a stochastic one. The deterministic component is represented by a series of sinusoids that are described by amplitude and frequency functions. The stochastic component is represented by a series of magnitude-spectrum envelopes that function as a time-varying filter excited by white noise. Together these representations make it possible for a synthesized sound to attain all the perceptual characteristics of the original sound. At the same time the representation is easily modified to create a wide variety of new sounds. This analysis/synthesis technique is based on the short-time Fourier transform (STFT). From the set of spectra returned by the STFT, the relevant peaks of each spectrum are detected and used as breakpoints in a set of frequency trajectories. The deterministic signal is obtained by synthesizing a sinusoid from each trajectory. Then, in order to obtain the stochastic component, a set of spectra of the deterministic component is computed, and these spectra are subtracted from the spectra of the original sound. The resulting spectral residuals are approximated by a series of envelopes, from which the stochastic signal is generated by performing an inverse-STFT. The result is a method that is appropriate for the manipulation of sounds. The intermediate representation is very flexible and musically useful in that it offers unlimited possibilities for transformation. iii iv To Eva and Octavi v vi Acknowledgements I wish to thank my main advisor Prof. John Chowning for his support throughout the course of this research. Also thanks to Prof. Julius Smith without whom this disserta-tion could not have even been imagined. His teachings in signal processing, his incredible enthusiasm, and practically infinite supply of ideas made this work possible. I have to acknowledge Prof. Earl Schubert for always being ready to help with the most diverse problems that I have encountered along the way.

329 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal Article
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

13,333 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
06 Jun 1986-JAMA
TL;DR: The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or her own research.
Abstract: I have developed "tennis elbow" from lugging this book around the past four weeks, but it is worth the pain, the effort, and the aspirin. It is also worth the (relatively speaking) bargain price. Including appendixes, this book contains 894 pages of text. The entire panorama of the neural sciences is surveyed and examined, and it is comprehensive in its scope, from genomes to social behaviors. The editors explicitly state that the book is designed as "an introductory text for students of biology, behavior, and medicine," but it is hard to imagine any audience, interested in any fragment of neuroscience at any level of sophistication, that would not enjoy this book. The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or

7,563 citations