scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Pitch tracking in reverberant environments

TL;DR: This paper compares Neural Network (NN) based approaches such as the Subband Autocorrelation Classifier (SAcC) with signal processing based methods such as YIN and RAPT and shows that multi-style training of NN using the CC+SA cC feature outperforms all the other methods.
Abstract: Pitch, or fundamental frequency, estimation is an important problem in speech processing. Research on pitch extraction is several years old and numerous algorithms have been developed over the years to improve its accuracy. It becomes more difficult in the presence of additive noise and reverberation because noise corrupts the periodicity information which is vital for estimating the pitch. In this paper, we present a quantitative analysis on pitch tracking in the presence of reverberation by different state of the art methods. We compare Neural Network (NN) based approaches such as the Subband Autocorrelation Classifier (SAcC) with signal processing based methods such as YIN and RAPT. We enhance the performance of SAcC by introducing a cross-correlogram feature (CC+SAcC). We further show that multi-style training of NN using the CC+SAcC feature outperforms all the other methods. Experiments were conducted using artificially reverberated Keele and TIMIT databases with room impulse responses of varying T60 values.
Citations
More filters
Journal ArticleDOI
01 Jan 2020
TL;DR: Several methods for pitch frequency estimation are investigated and compared on clear and reverberant male and female speech signals to select the one that is not affected so much by the reverberation effect.
Abstract: Reverberation is one of the effects that occur regularly in closed room due to multiple reflections. This paper investigates the result of reverberation on both male and female speech signals. This effect is reflected in pitch frequency of speech signals. This parameter is important as it is usually used for speaker identification. Hence, several methods for pitch frequency estimation are investigated and compared on clear and reverberant male and female speech signals to select the one that is not affected so much by the reverberation effect.

Cites background from "Pitch tracking in reverberant envir..."

  • ...It transfers multiple characteristics of the information transmitted by speech signal [4], so the pitch is the auditory quality of sound; it is a perceived fundamental frequency of sound....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Abstract: Image methods are commonly used for the analysis of the acoustic properties of enclosures. In this paper we discuss the theoretical and practical use of image techniques for simulating, on a digital computer, the impulse response between two points in a small rectangular room. The resulting impulse response, when convolved with any desired input signal, such as speech, simulates room reverberation of the input signal. This technique is useful in signal processing or psychoacoustic studies. The entire process is carried out on a digital computer so that a wide range of room parameters can be studied with accurate control over the experimental conditions. A fortran implementation of this model has been included.

3,720 citations


"Pitch tracking in reverberant envir..." refers background in this paper

  • ...Beginning from the image model [4], various algorithms have originated over the past few decades that allow the simulation and modeling of reverberant environments....

    [...]

Journal ArticleDOI
TL;DR: An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds, based on the well-known autocorrelation method with a number of modifications that combine to prevent errors.
Abstract: An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.

1,975 citations


"Pitch tracking in reverberant envir..." refers background or methods in this paper

  • ...The first one is a time domain approach that uses properties like autocorrelation [2,3] and phase space properties....

    [...]

  • ...Experimental Setup The Yin [3] Wu [5], RAPT [2], SWIPE [15] and YAAPT [14] algorithms are used for comparison of the pitch tracks Set Name Description...

    [...]

  • ...From Figure 6(a) we can observe the pitch tracking errors for Yin [3], Wu [5] and the RAPT [2] and SAcC [9] trained on the clean Keele corpus....

    [...]

  • ...The YIN [3] algorithm uses the squared difference function based on ACF to identify pitch candidates....

    [...]

  • ...We compare Neural Network (NN) based approaches such as the Subband Autocorrelation Classifier (SAcC) with signal processing based methods such as YIN and RAPT....

    [...]

Journal ArticleDOI
TL;DR: A segregation system that is consistent with psychological and physiological findings and significantly better than that of the frame-based segregation scheme described by Meddis and Hewitt (1992).

817 citations

Journal ArticleDOI
TL;DR: A comparative performance study of seven pitch detection algorithms was conducted, consisting of eight utterances spoken by three males, three females, and one child, to assess their relative performance as a function of recording condition, and pitch range of the various speakers.
Abstract: A comparative performance study of seven pitch detection algorithms was conducted. A speech data base, consisting of eight utterances spoken by three males, three females, and one child was constructed. Telephone, close talking microphone, and wideband recordings were made of each of the utterances. For each of the utterances in the data base; a "standard" pitch contour was semiautomatically measured using a highly sophisticated interactive pitch detection program. The "standard" pitch contour was then compared with the pitch contour that was obtained from each of the seven programmed pitch detectors. The algorithms used in this study were 1) a center clipping, infinite-peak clipping, modified autocorrelation method (AUTOC), 2) the cepstral method (CEP), 3) the simplified inverse filtering technique (SIFT) method, 4) the parallel processing time-domain method (PPROC), 5) the data reduction method (DARD), 6) a spectral flattening linear predictive coding (LPC) method, and 7) the average magnitude difference function (AMDF) method. A set of measurements was made on the pitch contours to quantify the various types of errors which occur in each of the above methods. Included among the error measurements were the average and standard deviation of the error in pitch period during voiced regions, the number of gross errors in the pitch period, and the average number of voiced-unvoiced classification errors. For each of the error measurements, the individual pitch detectors could be rank ordered as a measure of their relative performance as a function of recording condition, and pitch range of the various speakers. Performance scores are presented for each of the seven pitch detectors based on each of the categories of error.

793 citations


"Pitch tracking in reverberant envir..." refers background in this paper

  • ...PERFORMANCE METRIC Gross Pitch Error (GPE) and Voicing Decision Error (VDE) are the standard measures to determine errors in pitch tracking [13,18]....

    [...]

Journal ArticleDOI
TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.

570 citations


"Pitch tracking in reverberant envir..." refers background or methods in this paper

  • ...Corpus The Keele corpora [17] was used for training the MLP while the TIMIT corpora [18] was used for testing....

    [...]

  • ...But a corpus like the TIMIT [18] database possesses samples in which the UE and VE are very different....

    [...]

  • ...PERFORMANCE METRIC Gross Pitch Error (GPE) and Voicing Decision Error (VDE) are the standard measures to determine errors in pitch tracking [13,18]....

    [...]

  • ...As mentioned earlier, the training set is Keele [17] and the test set is TIMIT [18]....

    [...]