scispace - formally typeset
Search or ask a question
Author

Jean Laroche

Other affiliations: Télécom ParisTech, Orange S.A., Creative Technology  ...read more
Bio: Jean Laroche is an academic researcher from Audience. The author has contributed to research in topics: Audio signal & Frequency domain. The author has an hindex of 29, co-authored 75 publications receiving 3215 citations. Previous affiliations of Jean Laroche include Télécom ParisTech & Orange S.A..


Papers
More filters
Journal ArticleDOI
TL;DR: This contribution reviews frequency-domain algorithms (phase-vocoder) and time- domain algorithms (Time-Domain Pitch-Synchronous Overlap/Add and the like) in the same framework and presents more recent variations of these schemes.

363 citations

Journal ArticleDOI
TL;DR: This paper examines the problem of phasiness in the context of time-scale modification and provides new insights into its causes, and two extensions to the standard phase vocoder algorithm are introduced, and the resulting sound quality is shown to be significantly improved.
Abstract: The phase vocoder is a well established tool for time scaling and pitch shifting speech and audio signals via modification of their short-time Fourier transforms (STFTs). In contrast to time-domain time-scaling and pitch-shifting techniques, the phase vocoder is generally considered to yield high quality results, especially for large modification factors and/or polyphonic signals. However, the phase vocoder is also known for introducing a characteristic perceptual artifact, often described as "phasiness", "reverberation", or "loss of presence". This paper examines the problem of phasiness in the context of time-scale modification and provides new insights into its causes. Two extensions to the standard phase vocoder algorithm are introduced, and the resulting sound quality is shown to be significantly improved. Moreover, the modified phase vocoder is shown to provide a factor-of-two decrease in computational cost.

355 citations

Proceedings ArticleDOI
27 Apr 1993
TL;DR: HNS (harmonic plus noise synthesis), an analysis/modification/synthesis model based on a harmonic plus noise representation of the speech signal, is presented and informal listening-tests demonstrate the effectiveness of this approach for time-scale modifications.
Abstract: HNS (harmonic plus noise synthesis), an analysis/modification/synthesis model based on a harmonic plus noise representation of the speech signal, is presented. Significant improvements over previous work on the subject are proposed at both the analysis and the synthesis stages: a model of harmonically related sinusoids with linearly varying complex amplitudes for the representation of the deterministic part of the signal; a joint time-domain and frequency-domain representations of the stochastic part of the signal; and a pitch-synchronous PSOLA (pitch-synchronized overlap-add)-like synthesis scheme. Informal listening-tests demonstrate the effectiveness of this approach for time-scale modifications. >

168 citations

Patent
Jean Laroche1
15 May 2001
TL;DR: In this paper, a fingerprint of an audio signal is generated based on the energy content in frequency subbands, which is then compared to a database to identify the audio signal, which will be useful for signals altered subsequent to the generation of the fingerprint.
Abstract: A fingerprint of an audio signal is generated based on the energy content in frequency subbands. Processing techniques assure a robust identification fingerprint that will be useful for signals altered subsequent to the generation of the fingerprint. The fingerprint is compared to a database to identify the audio signal.

167 citations

Proceedings ArticleDOI
17 Oct 1999
TL;DR: The phase-vocoder is usually presented as a high-quality solution for time-scale modification of signals, pitch-scale modifications usually being implemented as a combination of timescaling and sampling rate conversion.
Abstract: The phase-vocoder is usually presented as a high-quality solution for time-scale modification of signals, pitch-scale modifications usually being implemented as a combination of timescaling and sampling rate conversion. We present two new phase-vocoder-based techniques which allow direct manipulation of the signal in the frequency-domain, enabling such applications as pitch-shifting, chorusing, harmonizing, partial stretching and other exotic modifications which cannot be achieved by the standard time-scale sampling-rate conversion scheme. The new techniques are based on a very simple peak-detection stage, followed by a peak-shifting stage. The very simplest one allows for 50% overlap but restricts the precision of the modifications, while the most flexible techniques requires a more expensive 75% overlap.

136 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The automatic classification of audio signals into an hierarchy of musical genres is explored and three feature sets for representing timbral texture, rhythmic content and pitch content are proposed.
Abstract: Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.

2,668 citations

Book
23 Nov 2007
TL;DR: This new edition now contains essential information on steganalysis and steganography, and digital watermark embedding is given a complete update with new processes and applications.
Abstract: Digital audio, video, images, and documents are flying through cyberspace to their respective owners. Unfortunately, along the way, individuals may choose to intervene and take this content for themselves. Digital watermarking and steganography technology greatly reduces the instances of this by limiting or eliminating the ability of third parties to decipher the content that he has taken. The many techiniques of digital watermarking (embedding a code) and steganography (hiding information) continue to evolve as applications that necessitate them do the same. The authors of this second edition provide an update on the framework for applying these techniques that they provided researchers and professionals in the first well-received edition. Steganography and steganalysis (the art of detecting hidden information) have been added to a robust treatment of digital watermarking, as many in each field research and deal with the other. New material includes watermarking with side information, QIM, and dirty-paper codes. The revision and inclusion of new material by these influential authors has created a must-own book for anyone in this profession. *This new edition now contains essential information on steganalysis and steganography *New concepts and new applications including QIM introduced *Digital watermark embedding is given a complete update with new processes and applications

1,773 citations

Journal ArticleDOI
TL;DR: A set of simple new procedures has been developed to enable the real-time manipulation of speech parameters by using pitch-adaptive spectral analysis combined with a surface reconstruction method in the time–frequency region.

1,741 citations

Journal ArticleDOI
TL;DR: The design of a new methodology for representing the relationship between two sets of spectral envelopes and the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods.
Abstract: Voice conversion, as considered in this paper, is defined as modifying the speech signal of one speaker (source speaker) so that it sounds as if it had been pronounced by a different speaker (target speaker). Our contribution includes the design of a new methodology for representing the relationship between two sets of spectral envelopes. The proposed method is based on the use of a Gaussian mixture model of the source speaker spectral envelopes. The conversion itself is represented by a continuous parametric function which takes into account the probabilistic classification provided by the mixture model. The parameters of the conversion function are estimated by least squares optimization on the training data. This conversion method is implemented in the context of the HNM (harmonic+noise model) system, which allows high-quality modifications of speech signals. Compared to earlier methods based on vector quantization, the proposed conversion scheme results in a much better match between the converted envelopes and the target envelopes. Evaluation by objective tests and formal listening tests shows that the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods.

1,109 citations

Journal ArticleDOI
S. Biyiksiz1
01 Mar 1985
TL;DR: This book by Elliott and Rao is a valuable contribution to the general areas of signal processing and communications and can be used for a graduate level course in perhaps two ways.
Abstract: There has been a great deal of material in the area of discrete-time transforms that has been published in recent years. This book does an excellent job of presenting important aspects of such material in a clear manner. The book has 11 chapters and a very useful appendix. Seven of these chapters are essentially devoted to the Fourier series/transform, discrete Fourier transform, fast Fourier transform (FFT), and applications of the FFT in the area of spectral estimation. Chapters 8 through 10 deal with many other discrete-time transforms and algorithms to compute them. Of these transforms, the KarhunenLoeve, the discrete cosine, and the Walsh-Hadamard transform are perhaps the most well-known. A lucid discussion of number theoretic transforms i5 presented in Chapter 11. This reviewer feels that the authors have done a fine job of compiling the pertinent material and presenting it in a concise and clear manner. There are a number of problems at the end of each chapter, an appreciable number of which are challenging. The authors have included a comprehensive set of references at the end of the book. In brief, this book is a valuable contribution to the general areas of signal processing and communications. It can be used for a graduate level course in perhaps two ways. One would be to cover the first seven chapters in great detail. The other would be to cover the whole book by focussing on different topics in a selective manner. This book by Elliott and Rao is extremely useful to researchers/engineers who are working in the areas of signal processing and communications. It i s also an excellent reference book, and hence a valuable addition to one’s library

843 citations