scispace - formally typeset
Search or ask a question
Proceedings Article

PARSHL: An analysis/synthesis program for non-harmonic sounds based on a sinusoidal representation

01 Jan 1987-Vol. 1987
TL;DR: A peak-tracking spectrum analyzer, called Parshl, which is useful for extracting additive synthesis parameters from inharmonic sounds such as the piano, based on the Short-Time Fourier Transform.
Abstract: This paper describes a peak-tracking spectrum analyzer, called Parshl, which is useful for extracting additive synthesis parameters from inharmonic sounds such as the piano. Parshl is based on the Short-Time Fourier Transform (STFT), adding features for tracking the amplitude, frequency, and phase trajectories of spectral lines from one FFT to the next. Parshl can be thought of as an “inharmonic phase vocoder” which uses tracking vocoder analysis channels instead of a fixed harmonic filter bank as used in previous FFT-based vocoders. This is the original full version from which the Technical Report (CCRMA STAN-M-43) and conference paper (ICMC-87) were prepared. Additionally, minor corrections are included, and a few pointers to more recent work have been added. Work supported in part by Dynacord, Inc., 1985
Citations
More filters
01 Sep 1978
TL;DR: The parts of this book of most interest and value to the EMC engineer will be the chapters on Thermal Noise, Antennas, Propagation and Transmission Lines, and Reflection and Refraction.
Abstract: dix A. Even if you don’t choose to memorize them this system aids in reference and retreival of important formulas. The book was compiled from notes developed during eight years of teaching a graduate course on the subject and was used as a text. Thus it has been student tested. Appendix F contains a number of problems, grouped to be used on a chapter by chapter basis The problems are designed to illustrate practical applications of the text material. The parts of this book of most interest and value to the EMC engineer will be the chapters on Thermal Noise, Antennas, Propagation and Transmission Lines, and Reflection and Refraction. This is not to downpade the chapters on Statistics and Its Applications, Signal Processing and Detection, and Some System Characteristics which also contain much potentially useful materials. Additional plus values for the book include a list of 40 references, a table of symbols used throughout the book, and a subject index. Some readers may find the condensed type and close line spacing hard to read. It was apparently set up by typewriter using an elite type face with single line spacing. When reduced down to a 6 by 9 5 inch size page it is too crowded for easy reading. In spite of this shortcoming your reviewer recommends this book as a worthwhile reference in this field of interest.

413 citations

Book Chapter
01 Jan 1997
TL;DR: When generating musical sound on a digital computer, it is important to have a good model whose parameters provide a rich source of meaningful sound transformations.
Abstract: When generating musical sound on a digital computer, it is important to have a good model whose parameters provide a rich source of meaningful sound transformations. Three basic model types are in prevalent use today for musical sound generation: instrument models, spectrum models, and abstract models. Instrument models attempt to parametrize a sound at its source, such as a violin, clarinet, or vocal tract. Spectrum models attempt to parametrize a sound at the basilar membrane of the ear, discarding whatever information the ear seems to discard in the spectrum. Abstract models, such as FM, attempt to provide musically useful parameters in an abstract formula.

390 citations

Book
01 Jan 1989
TL;DR: This dissertation introduces a new analysis/synthesis method designed to obtain musically useful intermediate representations for sound transformations that is appropriate for the manipulation of sounds.
Abstract: a dissertation submitted to the department of music and the committee on graduate studies of stanford university in partial fulfillment of the requirements for the degree of doctor of philosophy This dissertation introduces a new analysis/synthesis method. It is designed to obtain musically useful intermediate representations for sound transformations. The method's underlying model assumes that a sound is composed of a deterministic component plus a stochastic one. The deterministic component is represented by a series of sinusoids that are described by amplitude and frequency functions. The stochastic component is represented by a series of magnitude-spectrum envelopes that function as a time-varying filter excited by white noise. Together these representations make it possible for a synthesized sound to attain all the perceptual characteristics of the original sound. At the same time the representation is easily modified to create a wide variety of new sounds. This analysis/synthesis technique is based on the short-time Fourier transform (STFT). From the set of spectra returned by the STFT, the relevant peaks of each spectrum are detected and used as breakpoints in a set of frequency trajectories. The deterministic signal is obtained by synthesizing a sinusoid from each trajectory. Then, in order to obtain the stochastic component, a set of spectra of the deterministic component is computed, and these spectra are subtracted from the spectra of the original sound. The resulting spectral residuals are approximated by a series of envelopes, from which the stochastic signal is generated by performing an inverse-STFT. The result is a method that is appropriate for the manipulation of sounds. The intermediate representation is very flexible and musically useful in that it offers unlimited possibilities for transformation. iii iv To Eva and Octavi v vi Acknowledgements I wish to thank my main advisor Prof. John Chowning for his support throughout the course of this research. Also thanks to Prof. Julius Smith without whom this disserta-tion could not have even been imagined. His teachings in signal processing, his incredible enthusiasm, and practically infinite supply of ideas made this work possible. I have to acknowledge Prof. Earl Schubert for always being ready to help with the most diverse problems that I have encountered along the way.

329 citations

Journal ArticleDOI
TL;DR: A new two‐way mismatch (TWM) procedure for estimating fundamental frequency (F0) estimation for quasiharmonic signals is described which may lead to improved results in this area.
Abstract: Fundamental frequency (F0) estimation for quasiharmonic signals is an important task in music signal processing. Many previously developed techniques have suffered from unsatisfactory performance due to ambiguous spectra, noise perturbations, wide frequency range, vibrato, and other common artifacts encountered in musical signals. In this paper a new two‐way mismatch (TWM) procedure for estimating F0 is described which may lead to improved results in this area. This computer‐based method uses the quasiharmonic assumption to guide a search for F0 based on the short‐time spectra of an input signal. The estimated F0 is chosen to minimize discrepancies between measured partial frequencies and harmonic frequencies generated by trial values of F0. For each trial F0, mismatches between the harmonics generated and the measured partial frequencies are averaged over a fixed subset of the available partials. A weighting scheme is used to reduce the susceptibility of the procedure to the presence of noise or absence ...

175 citations


Additional excerpts

  • ...technique (Smith and Serra, 1987; Maher, 1990 and 1991)....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes an iterative greedy search strategy to estimate F0s one by one, to avoid the combinatorial problem of concurrent F0 estimation, and proposes a polyphony estimation method to terminate the iterative process.
Abstract: This paper presents a maximum-likelihood approach to multiple fundamental frequency (F0) estimation for a mixture of harmonic sound sources, where the power spectrum of a time frame is the observation and the F0s are the parameters to be estimated. When defining the likelihood model, the proposed method models both spectral peaks and non-peak regions (frequencies further than a musical quarter tone from all observed peaks). It is shown that the peak likelihood and the non-peak region likelihood act as a complementary pair. The former helps find F0s that have harmonics that explain peaks, while the latter helps avoid F0s that have harmonics in non-peak regions. Parameters of these models are learned from monophonic and polyphonic training data. This paper proposes an iterative greedy search strategy to estimate F0s one by one, to avoid the combinatorial problem of concurrent F0 estimation. It also proposes a polyphony estimation method to terminate the iterative process. Finally, this paper proposes a postprocessing method to refine polyphony and F0 estimates using neighboring frames. This paper also analyzes the relative contributions of different components of the proposed method. It is shown that the refinement component eliminates many inconsistent estimation errors. Evaluations are done on ten recorded four-part J. S. Bach chorales. Results show that the proposed method shows superior F0 estimation and polyphony estimation compared to two state-of-the-art algorithms.

173 citations


Cites methods from "PARSHL: An analysis/synthesis progr..."

  • ...Experiments are presented in Section VII, and the paper is concluded in Section VIII....

    [...]

References
More filters
Journal ArticleDOI
01 Jan 1978
TL;DR: A comprehensive catalog of data windows along with their significant performance parameters from which the different windows can be compared is included, and an example demonstrates the use and value of windows to resolve closely spaced harmonic signals characterized by large differences in amplitude.
Abstract: This paper makes available a concise review of data windows and their affect on the detection of harmonic signals in the presence of broad-band noise, and in the presence of nearby strong harmonic interference. We also call attention to a number of common errors in the application of windows when used with the fast Fourier transform. This paper includes a comprehensive catalog of data windows along with their significant performance parameters from which the different windows can be compared. Finally, an example demonstrates the use and value of windows to resolve closely spaced harmonic signals characterized by large differences in amplitude.

7,130 citations


"PARSHL: An analysis/synthesis progr..." refers background in this paper

  • ...Harris [7, 14] gives a good discussion of these windows and many others....

    [...]

Journal ArticleDOI
John Makhoul1
01 Apr 1975
TL;DR: This paper gives an exposition of linear prediction in the analysis of discrete signals as a linear combination of its past values and present and past values of a hypothetical input to a system whose output is the given signal.
Abstract: This paper gives an exposition of linear prediction in the analysis of discrete signals The signal is modeled as a linear combination of its past values and present and past values of a hypothetical input to a system whose output is the given signal In the frequency domain, this is equivalent to modeling the signal spectrum by a pole-zero spectrum The major part of the paper is devoted to all-pole models The model parameters are obtained by a least squares analysis in the time domain Two methods result, depending on whether the signal is assumed to be stationary or nonstationary The same results are then derived in the frequency domain The resulting spectral matching formulation allows for the modeling of selected portions of a spectrum, for arbitrary spectral shaping in the frequency domain, and for the modeling of continuous as well as discrete spectra This also leads to a discussion of the advantages and disadvantages of the least squares error criterion A spectral interpretation is given to the normalized minimum prediction error Applications of the normalized error are given, including the determination of an "optimal" number of poles The use of linear prediction in data compression is reviewed For purposes of transmission, particular attention is given to the quantization and encoding of the reflection (or partial correlation) coefficients Finally, a brief introduction to pole-zero modeling is given

4,206 citations


"PARSHL: An analysis/synthesis progr..." refers methods in this paper

  • ...Other ways to measure formant envelopes include cepstral smoothing [15] and the fitting of low-order LPC models to the inverse FFT of the squared magnitude of the spectrum [9]....

    [...]

Book
02 Dec 2011
TL;DR: Speech Analysis and Synthesis Models: Basic Physical Principles, Speech Synthesis Structures, and Considerations in Choice of Analysis.
Abstract: 1. Introduction.- 1.1 Basic Physical Principles.- 1.2 Acoustical Waveform Examples.- 1.3 Speech Analysis and Synthesis Models.- 1.4 The Linear Prediction Model.- 1.5 Organization of Book.- 2. Formulations.- 2.1 Historical Perspective.- 2.2 Maximum Likelihood.- 2.3 Minimum Variance.- 2.4 Prony's Method.- 2.5 Correlation Matching.- 2.6 PARCOR (Partial Correlation).- 2.6.1 Inner Products and an Orthogonality Principle.- 2.6.2 The PARCOR Lattice Structure.- 3. Solutions and Properties.- 3.1 Introduction.- 3.2 Vector Spaces and Inner Products.- 3.2.1 Filter or Polynomial Norms.- 3.2.2 Properties of Inner Products.- 3.2.3 Orthogonality Relations.- 3.3 Solution Algorithms.- 3.3.1 Correlation Matrix.- 3.3.2 Initialization.- 3.3.3 Gram-Schmidt Orthogonalization.- 3.3.4 Levinson Recursion.- 3.3.5 Updating Am(z).- 3.3.6 A Test Example.- 3.4 Matrix Forms.- 4. Acoustic Tube Modeling.- 4.1 Introduction.- 4.2 Acoustic Tube Derivation.- 4.2.1 Single Section Derivation.- 4.2.2 Continuity Conditions.- 4.2.3 Boundary Conditions.- 4.3 Relationship between Acoustic Tube and Linear Prediction.- 4.4 An Algorithm, Examples, and Evaluation.- 4.4.1 An Algorithm.- 4.4.2 Examples.- 4.4.3 Evaluation of the Procedure.- 4.5 Estimation of Lip Impedance.- 4.5.1 Lip Impedance Derivation.- 4.6 Further Topics.- 4.6.1 Losses in the Acoustic Tube Model.- 4.6.2 Acoustic Tube Stability.- 5. Speech Synthesis Structures.- 5.1 Introduction.- 5.2 Stability.- 5.2.1 Step-up Procedure.- 5.2.2 Step-down Procedure.- 5.2.3 Polynomial Properties.- 5.2.4 A Bound on |Fm(z)|.- 5.2.5 Necessary and Sufficient Stability Conditions.- 5.2.6 Application of Results.- 5.3 Recursive Parameter Evaluation.- 5.3.1 Inner Product Properties.- 5.3.2 Equation Summary with Program.- 5.4 A General Synthesis Structure.- 5.5 Specific Speech Synthesis Structures.- 5.5.1 The Direct Form.- 5.5.2 Two-Multiplier Lattice Model.- 5.5.3 Kelly-Lochbaum Model.- 5.5.4 One-Multiplier Models.- 5.5.5 Normalized Filter Model.- 5.5.6 A Test Example.- 6. Spectral Analysis.- 6.1 Introduction.- 6.2 Spectral Properties.- 6.2.1 Zero Mean All-Pole Model.- 6.2.2 Gain Factor for Spectral Matching.- 6.2.3 Limiting Spectral Match.- 6.2.4 Non-uniform Spectral Weighting.- 6.2.5 Minimax Spectral Matching.- 6.3 A Spectral Flatness Model.- 6.3.1 A Spectral Flatness Measure.- 6.3.2 Spectral Flatness Transformations.- 6.3.3 Numerical Evaluation.- 6.3.4 Experimental Results.- 6.3.5 Driving Function Models.- 6.4 Selective Linear Prediction.- 6.4.1 Selective Linear Prediction (SLP) Algorithm.- 6.4.2 A Selective Linear Prediction Program.- 6.4.3 Computational Considerations.- 6.5 Considerations in Choice of Analysis Conditions.- 6.5.1 Choice of Method.- 6.5.2 Sampling Rates.- 6.5.3 Order of Filter.- 6.5.4 Choice of Analysis Interval.- 6.5.5 Windowing.- 6.5.6 Pre-emphasis.- 6.6 Spectral Evaluation Techniques.- 6.7 Pole Enhancement.- 7. Automatic Formant Trajectory Estimation.- 7.1 Introduction.- 7.2 Formant Trajectory Estimation Procedure.- 7.2.1 Introduction.- 7.2.2 Raw Data from A(z).- 7.2.3 Examples of Raw Data.- 7.3 Comparison of Raw Data from Linear Prediction and Cepstral Smoothing.- 7.4 Algorithm 1.- 7.5 Algorithm 2.- 7.5.1 Definition of Anchor Points.- 7.5.2 Processing of Each Voiced Segment.- 7.5.3 Final Smoothing.- 7.5.4 Results and Discussion.- 7.6 Formant Estimation Accuracy.- 7.6.1 An Example of Synthetic Speech Analysis.- 7.6.2 An Example of Real Speech Analysis.- 7.6.3 Influence of Voice Periodicity.- 8. Fundamental Frequency Estimation.- 8.1 Introduction.- 8.2 Preprocessing by Spectral Flattening.- 8.2.1 Analysis of Voiced Speech with Spectral Regularity.- 8.2.2 Analysis of Voiced Speech with Spectral Irregularities.- 8.2.3 The STREAK Algorithm.- 8.3 Correlation Techniques.- 8.3.1 Autocorrelation Analysis.- 8.3.2 Modified Autocorrelation Analysis.- 8.3.3 Filtered Error Signal Autocorrelation Analysis.- 8.3.4 Practical Considerations.- 8.3.5 The SIFT Algorithm.- 9. Computational Considerations in Analysis.- 9.1 Introduction.- 9.2 Ill-Conditioning.- 9.2.1 A Measure of Ill-Conditioning.- 9.2.2 Pre-emphasis of Speech Data.- 9.2.3 Prefiltering before Sampling.- 9.3 Implementing Linear Prediction Analysis.- 9.3.1 Autocorrelation Method.- 9.3.2 Covariance Method.- 9.3.3 Computational Comparison.- 9.4 Finite Word Length Considerations.- 9.4.1 Finite Word Length Coefficient Computation.- 9.4.2 Finite Word Length Solution of Equations.- 9.4.3 Overall Finite Word Length Implementation.- 10. Vocoders.- 10.1 Introduction.- 10.2 Techniques.- 10.2.1 Coefficient Transformations.- 10.2.2 Encoding and Decoding.- 10.2.3 Variable Frame Rate Transmission.- 10.2.4 Excitation and Synthesis Gain Matching.- 10.2.5 A Linear Prediction Synthesizer Program.- 10.3 Low Bit Rate Pitch Excited Vocoders.- 10.3.1 Maximum Likelihood and PARCOR Vocoders.- 10.3.2 Autocorrelation Method Vocoders.- 10.3.3 Covariance Method Vocoders.- 10.4 Base-Band Excited Vocoders.- 11. Further Topics.- 11.1 Speaker Identification and Verification.- 11.2 Isolated Word Recognition.- 11.3 Acoustical Detection of Laryngeal Pathology.- 11.4 Pole-Zero Estimation.- 11.5 Summary and Future Directions.- References.

1,945 citations

Book
01 Jan 1977

1,743 citations

Journal ArticleDOI
TL;DR: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves, which forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.
Abstract: A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overlapping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding [8], [9].

1,659 citations


"PARSHL: An analysis/synthesis progr..." refers background or methods in this paper

  • ...When phase must be matched in a given frame, the frequency can instead move quadratically across the frame to provide cubic polynomial phase interpolation [12], or a second linear breakpoint can be introduced somewhere in the frame for the frequency trajectory....

    [...]

  • ...We will not go into the details of solving this equation since McAulay and Quatieri [12] go through every step....

    [...]

  • ...From our work it is still not clear how important is the phase information in the case of resynthesis without modifications, but McAulay and Quatieri [12] have shown the importance of phase in the case of speech resynthesis....

    [...]

  • ...Phase support was added much later by the second author in the context of his Ph.D. research, based on the work of McAulay and Quatieri [12]....

    [...]

  • ...research, based on the work of McAulay and Quatieri [12]....

    [...]