scispace - formally typeset
Search or ask a question

Showing papers on "Pitch detection algorithm published in 1988"


Journal ArticleDOI
TL;DR: It is argued that the favorable performance of the subharmonic-summation algorithm stems from its corresponding more closely with current pitch-perception theories than does the harmonic sieve.
Abstract: In order to account for the phenomenon of virtual pitch, various theories assume implicitly or explicitly that each spectral component introduces a series of subharmonics. The spectral-compression method for pitch determination can be viewed as a direct implementation of this principle. The widespread application of this principle in pitch determination is, however, impeded by numerical problems with respect to accuracy and computational efficiency. A modified algorithm is described that solves these problems. Its performance is tested for normal speech and "telephone" speech, i.e., speech high-pass filtered at 300 Hz. The algorithm out-performs the harmonic-sieve method for pitch determination, while its computational requirements are about the same. The algorithm is described in terms of nonlinear system theory, i.c., subharmonic summation. It is argued that the favorable performance of the subharmonic-summation algorithm stems from its corresponding more closely with current pitch-perception theories than does the harmonic sieve.

340 citations


Patent
18 Jul 1988
TL;DR: In this article, the authors propose to shorten a time delay extending from picking a string to sounding by storing pitch information corresponding to a fundamental frequency and pitch information of the string, etc., corresponding thereto, changing the pitch information at the time of acting by said both pitch information and determining a musical sound pitch.
Abstract: PURPOSE:To shorten a time delay extending from picking a string to sounding by storing pitch information corresponding to a fundamental frequency and pitch information of the string, etc., corresponding thereto, changing the pitch information of the string, etc., at the time of acting by said both pitch information and determining a musical sound pitch. CONSTITUTION:By a guitar synthesizer, a changeover switch 6 is set to a storage mode, each fret is made open and a string is picked. This vibration is detected as a pitch A by a pitch detector 4, and stored in a memory 8. When the switch 6 is switched to a musical performance mode, and a performer picks the string in a state that the fret has been held down, its pitch (p) is detected by the pitch detector 4, and supplied to an arithmetic means 10. The means 10 calculates a pitch of a musical sound to be sounded by P'=P.B/A, based on a pitch P, a storage pitch A of the memory 8 and a pitch B of a reference pitch memory 12, and a sound source 14 supplies it in order to generate a musical sound, based thereon. In such a way, a time delay extending from picking the string to sounding can be prevented by the sound source 14.

28 citations


PatentDOI
Hubert Crepy1, Philippe Elie1, Claude Galand1, Emmanuel Lancon1, Thierry Liethoudt1, Michele Rosso1 
TL;DR: In this paper, a pitch detector is used to adjust long term prediction in a pulse excitation speech coder, and a residual signal is derived from the speech signal s(n) by short term filtering.
Abstract: A pitch detector to adjust long term prediction in a pulse excitation speech coder. A residual signal r(n) is first derived from the speech signal s(n) by short term filtering. Then, r(n) is processed to calculate a prediction error signal e(n) which is subsequently pulse excitation encoded. The processing of e(n) entails prediction of a residual by measuring a pitch related factor M, employing two steps. First calculating a coarse M value through peak clipping and sign transition detection, and then adjusting the M value by autocorrelation--calculations about the roughly spaced peaks.

25 citations


PatentDOI
TL;DR: A transform coder operates on a sampled speech signal transformed from the time domain to a frequency domain to develop pitch information in relation to a given speech signal.
Abstract: A transform coder operates on a sampled speech signal transformed from the time domain to a frequency domain to develop pitch information in relation to a given speech signal. The coder segregates groups of information samples into blocks, transforms each block of samples, and generates an auto-correlation function of the transformed signal for each block. Next, the coder determines the pitch period and pitch gain from the auto-correlation function, and determines the striation magnitude and energy from the pitch period and pitch gain. Then a reference pitch model including a number of data points is retrieved from data memory. A striation scaling factor is generated in response to the striation magnitude and energy, and is multiplied by each of the retrieved data points to adaptively generate a pitch model. Finally, the adaptively determined model is sampled to establish the pitch information.

23 citations


Proceedings ArticleDOI
24 Jun 1988
TL;DR: An accurate silence-unvoiced-voiced classification and pitch detection algorithm is described and its implementation for real-time applications on a Texas Instruments TMS320C25 digital signal processor is evaluated.
Abstract: An accurate silence-unvoiced-voiced classification and pitch detection algorithm is described and its implementation for real-time applications on a Texas Instruments TMS320C25 digital signal processor is evaluated. Speech classification is separated into silence detection and voice-unvoiced classification. Only the signal's energy level and zero-crossing rate are used in both classification processes. Pitch detection need only operate on voiced periods of speech. A peak picking technique is used to successively home in on the peaks that bound the pitch periods. Tests are performed on the found peaks to ensure that they are pitch-period peaks. A real-time implementation strategy is developed that combines silence detection with the signal acquisition and tightly couples voiced-unvoiced classification with pitch detection. The silence detection task is interrupt-driven and the pitch detection task loops continuously. The execution speed and accuracy results for this algorithm are shown to compare favorably with those for other such algorithms published in the literature. >

15 citations


Journal ArticleDOI
TL;DR: The design and implementation of a parallel-processing-based pitch detector is presented and the results show that the pitch detector performance is maintained in the real-time implementation mainly because the majority of the algorithm computations are integer arithmetic and logic-type operations.
Abstract: The design and implementation of a parallel-processing-based pitch detector is presented. Pitch information is extracted by performing pitch detection on four different waveforms derived from the speech signal. Pitch information from the four pitch-detection processes is then combined to determine a final pitch estimate. The performance of this pitch detector is evaluated on a large database and compared to other well-known pitch detection algorithms. It has been implemented in real time on a TMS32020 fixed-point digital signal processor as part of a 2.4 kb/s vocoder. A performance comparison of the real-time fixed-point implementation and a computer simulation are also given. The results show that the pitch detector performance is maintained in the real-time implementation mainly because the majority of the algorithm computations are integer arithmetic and logic-type operations. >

13 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: An algorithm designed to improve results for separating two voices simultaneously recorded on a single channel is presented and a prime factor fast Fourier transform has been developed.
Abstract: An algorithm designed to improve results for separating two voices simultaneously recorded on a single channel is presented. A variable frame size orthogonal transform and a spectral matching technique are used. A multistep pitch detection scheme is proposed which includes a traditional autocorrelation function, a modified autocorrelation, the average magnitude difference function, and a look-forward and look-backward double checking scheme. The orthogonal transforms utilized include the fast Fourier transform and the fast triangular transform. For a variable frame size transform, a prime factor fast Fourier transform has been developed. The execution of the process is automated and implemented on the IBM-PC, VAX 8650, and HP 9000. Intelligibility tests using simple quantitative measures have been performed on the separated signals. An extension of the problem to the three-speaker case is reported. >

13 citations


Patent
Naoaki Matsumoto1
11 Oct 1988
TL;DR: In this paper, a musical tone generating device generates musical tones of a frequency in accordance with pitches which are extracted from input waveform signals by a pitch extracting means, and the frequency of the musical tone is defined on the basis of the calculated average which serves as a current pitch.
Abstract: A musical tone generating device generates musical tones of a frequency in accordance with pitches which are extracted from input waveform signals by a pitch extracting means. When a pitch extracted by the pitch extracting means varies within a range of a predetermined musical interval difference, an average of the currently extracted pitch and the previously extracted pitch is calculated and the frequency of the musical tone is defined on the basis of the calculated average which serves as a current pitch. On the other hand, when the currently extracted pitch exceeds the above-mentioned range, the frequency of the musical tone is defined on the basis of the currently extracted pitch. In this manner, an undesirable influence to the sound frequency caused by any unnecessary variations or fluctuations in the pitch is decreased or eliminated, thereby enabling producing of a steady sound frequency. In addition, when the pitch is intentionally altered, the frequency of the musical tone is instantly changed in response to the pitch alteration.

12 citations


Proceedings ArticleDOI
20 Mar 1988
TL;DR: The initial results obtained by comparing the average gross pitch error rate suggest that PEA Hash 2 is better (by a factor of two or more) than PEAHash 1.
Abstract: Two computationally simple pitch-extraction algorithms based on the autocorrelation method of pitch determination are presented Both algorithms have been implemented in software, and their performance has been evaluated The first pitch-extraction algorithm (PEA Hash 1) uses center clipping and infinite peak dipping for time-domain preprocessing before computing autocorrelation while the second algorithm (PEA Hash 2) nonlinearly distorts the speech signal before center clipping and autocorrelation computation PEA Hash 2 provides a better pitch detection estimate than PEA Hash 1 and also eliminates the need to adjust critically the clipping level threshold The initial results obtained by comparing the average gross pitch error rate suggest that PEA Hash 2 is better (by a factor of two or more) than PEA Hash 1 >

9 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: A robust method is described for the determination of pitch in voiced speech that measures the regular harmonic spacing in the spectrum of the speech signal by applying an autocorrelation to the power spectral density.
Abstract: A robust method is described for the determination of pitch in voiced speech. The spectral autocorrelation function measures the regular harmonic spacing in the spectrum of the speech signal by applying an autocorrelation to the power spectral density. It is shown that this method is capable of accurate pitch tracking in additive noise. The preprocessing is critical for good performance, and in this case the speech spectrum is flattened by obtaining the residual from linear prediction analysis and the original speech. Application of suitable postprocessing enables this method to operate over a wide range of human pitch down to very low signal-to-noise ratios. >

7 citations


Proceedings ArticleDOI
Y. Medan1, E. Yair1, D. Chazan1
14 Nov 1988
TL;DR: An accurate, robust, and reliable pitch detection scheme is outlined that makes it possible to analyze the pitch variation process in disarthric patients for medical purposes and to develop an accurate synchronous spectral analysis scheme.
Abstract: An accurate, robust, and reliable pitch detection scheme is outlined. The algorithm extracts the pitch with a very high resolution (i.e., as a real number) despite the finite (8-kHz) resolution of the sampled speech sequence. This makes it possible to analyze the pitch variation process in disarthric patients for medical purposes and to develop an accurate synchronous spectral analysis scheme. The computational complexity of the proposed algorithm is well below the capacity of modern digital signal processing technology and therefore can be implemented in real-time. >

DOI
01 Apr 1988
TL;DR: An improved algorithm, based on the temporal investigation of the speech waveform, is described and evaluated and shown to offer improved performance in both long term pitch period estimation accuracy, and cycle to cycle accuracy.
Abstract: The paper examines the performance of several versions of the parallel processing method of pitch period estimation of speech, highlighting the limitations of each. An improved algorithm, based on the temporal investigation of the speech waveform, is described and evaluated. It is shown to offer improved performance in both long term pitch period estimation accuracy, and cycle to cycle accuracy. A new method of evaluating the performance of pitch detection algorithms, based on measuring their stability with respect to variation of the time origin of the input speech, is also described.

Journal ArticleDOI
TL;DR: It is shown that synthetic speech generated using excitation pulses which resemble the true glottal volume-velocity excitation waveform is preferred over speech synthesized using a two-pole glottAL filter and impulse excitation.
Abstract: This paper describes analysis and synthesis methods for a digital formant synthesizer. It is shown that synthetic speech generated using excitation pulses which resemble the true glottal volume-velocity excitation waveform is preferred over speech synthesized using a two-pole glottal filter and impulse excitation. Listeners also ranked speech tokens where the excitation source incorporated the effects of source-tract interaction higher in naturalness relative to token where the interaction was absent.A series of algorithms for voiced/unvoiced/mixed/silent interval classification, pitch detection, and formant estimation and tracking are described. These algorithms utilize two channels of input data, viz, the speech and the electro lotto graphic signals, and thereby achieve superior performance in comparison to single channel speech only algorithms.We have also initiated an investigation into the feasibility of using the digital formant synthesizer to study the acoustic correlates of voice quality. A number...

Proceedings ArticleDOI
11 Apr 1988
TL;DR: An algorithm is given for estimating this stretch and converting it to rate of change of pitch and results are presented showing the accuracy of the method.
Abstract: Spectra formed from adjacent speech segments are likely to be similar particularly if the pitch is constant. If the pitch is going up, the peaks in the later spectrum will be farther apart; the second peak will be very like a stretched version of the first. An algorithm is given for estimating this stretch and converting it to rate of change of pitch. Results are presented showing the accuracy of the method. >

Proceedings ArticleDOI
11 Apr 1988
TL;DR: An approach to time-domain pitch detection based on the concepts of center of mass and Excursions, or bumps, in speech signals is presented and evaluated, showing that the algorithm is robust and accurate.
Abstract: An approach to time-domain pitch detection based on the concepts of center of mass is presented and evaluated. Excursions, or bumps, in speech signals are treated as geometric areas and replaced by their total mass lumped at the centers of mass. Intervals between masses are grouped into candidate classes. Coincidence and coherence indices of these classes are computed to determine the most likely pitch estimate. Postprocessing consists of a simple error-correction and silence-detection scheme. This algorithm compares favourably in performance with the autocorrelation method, using pitch contours from electroglottograph signals as a reference. The algorithm is tested in noisy environments simulated by uniformly distributed white noise and multitalker babble noise. Results show that the algorithm is robust and accurate. The implementation of this algorithm for a vibrotactile device to aid lipreading is described. >

01 Jan 1988
TL;DR: In this paper, a robust method for the determination of pitch in voiced speech is described, which measures the regular harmonic spacing in the spectrum of the speech signal by applying an autocorrelation to the power spectral density.
Abstract: This paper describes a robust method for the determination of pitch in voiced speech. The spectral autocorrelation function measures the regular harmonic spacing. in the spectrum of the speech signal by applying an autocorrelation to the power spectral density. It is shown that this method is capable of accurate pitch tracking in additive noise. The preprocessing is critical for good performance and in this case the speech spectrum is flattened by obtaining the residual from linear prediction analysis of the original speech. Application of suitable post-processing enables this method to operate over a wide range of human pitch down to very low signal noise ratios.

Patent
20 Jan 1988
TL;DR: In this paper, a pitch data detecting means is used to adjust long term predictive means in a pulse excitation speech coder, where a residual signal r(n) is first derived from the speech signal s(n), through short term filtering then rn is processed to provide a prediction error signal e(n).
Abstract: A pitch data detecting means to be used to adjust long term predictive means in a pulse excitation speech coder A residual signal r(n) is first derived from the speech signal s(n) through short term filtering then r(n) is processed to provide a prediction error signal e(n) to be pulse excitation encoded The generation of e(n) involves predicting a residual through Long Term Prediction operations including measuring a pitch related factor M, through a dual steps process with first step providing a coarse M value through peak clipping and sign transition detection, and then second step for adjusting said M to a finer value through autocorrelations operating about the roughly spaced peaks