scispace - formally typeset
Proceedings ArticleDOI

Pitch tracking in reverberant environments

07 Dec 2015-pp 192-196

...read more


Citations
More filters
Journal ArticleDOI

[...]

01 Jan 2020
TL;DR: Several methods for pitch frequency estimation are investigated and compared on clear and reverberant male and female speech signals to select the one that is not affected so much by the reverberation effect.
Abstract: Reverberation is one of the effects that occur regularly in closed room due to multiple reflections. This paper investigates the result of reverberation on both male and female speech signals. This effect is reflected in pitch frequency of speech signals. This parameter is important as it is usually used for speaker identification. Hence, several methods for pitch frequency estimation are investigated and compared on clear and reverberant male and female speech signals to select the one that is not affected so much by the reverberation effect.

Cites background from "Pitch tracking in reverberant envir..."

  • [...]


References
More filters
Journal ArticleDOI

[...]

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Abstract: Image methods are commonly used for the analysis of the acoustic properties of enclosures. In this paper we discuss the theoretical and practical use of image techniques for simulating, on a digital computer, the impulse response between two points in a small rectangular room. The resulting impulse response, when convolved with any desired input signal, such as speech, simulates room reverberation of the input signal. This technique is useful in signal processing or psychoacoustic studies. The entire process is carried out on a digital computer so that a wide range of room parameters can be studied with accurate control over the experimental conditions. A fortran implementation of this model has been included.

3,284 citations


"Pitch tracking in reverberant envir..." refers background in this paper

  • [...]

Journal ArticleDOI

[...]

TL;DR: An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds, based on the well-known autocorrelation method with a number of modifications that combine to prevent errors.
Abstract: An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.

1,835 citations


"Pitch tracking in reverberant envir..." refers background or methods in this paper

  • [...]

  • [...]

  • [...]

  • [...]

  • [...]

Journal ArticleDOI

[...]

TL;DR: A segregation system that is consistent with psychological and physiological findings and significantly better than that of the frame-based segregation scheme described by Meddis and Hewitt (1992).
Abstract: Although the ability of human listeners to perceptually segregate concurrent sounds is well documented in the literature, there have been few attempts to exploit this research in the design of computational systems for sound source segregation. In this paper, we present a segregation system that is consistent with psychological and physiological findings. The system is able to segregate speech from a variety of intrusive sounds, including other speech, with some success. The segregation system consists of four stages. Firstly, the auditory periphery is modelled by a bank of bandpass filters and a simulation of neuromechanical transduction by inner hair cells. In the second stage of the system, periodicities, frequency transitions, onsets and offsets in auditory nerve firing patterns are made explicit by separate auditory representations. The representations, auditory maps , are based on the known topographical organization of the higher auditory pathways. Information from the auditory maps is used to construct a symbolic description of the auditory scene. Specifically, the acoustic input is characterized as a collection of time-frequency elements, each of which describes the movement of a spectral peak in time and frequency. In the final stage of the system, a search strategy is employed which groups elements according to the similarity of their fundamental frequencies, onset times and offset times. Following the search, a waveform can be resynthesized from a group of elements so that segregation performance may be assessed by informal listening tests. The system has been evaluated using a database of voiced speech mixed with a variety of intrusive noises such as music, "office" noise and other speech. A technique for quantitative evaluation of the system is described, in which the signal-to-noise ratio (SNR) is compared before and after the segregation process. After segregation, an increase in SNR is obtained for each noise condition. Additionally, the performance of our system is significantly better than that of the frame-based segregation scheme described by Meddis and Hewitt (1992).

788 citations

Journal ArticleDOI

[...]

TL;DR: A comparative performance study of seven pitch detection algorithms was conducted, consisting of eight utterances spoken by three males, three females, and one child, to assess their relative performance as a function of recording condition, and pitch range of the various speakers.
Abstract: A comparative performance study of seven pitch detection algorithms was conducted. A speech data base, consisting of eight utterances spoken by three males, three females, and one child was constructed. Telephone, close talking microphone, and wideband recordings were made of each of the utterances. For each of the utterances in the data base; a "standard" pitch contour was semiautomatically measured using a highly sophisticated interactive pitch detection program. The "standard" pitch contour was then compared with the pitch contour that was obtained from each of the seven programmed pitch detectors. The algorithms used in this study were 1) a center clipping, infinite-peak clipping, modified autocorrelation method (AUTOC), 2) the cepstral method (CEP), 3) the simplified inverse filtering technique (SIFT) method, 4) the parallel processing time-domain method (PPROC), 5) the data reduction method (DARD), 6) a spectral flattening linear predictive coding (LPC) method, and 7) the average magnitude difference function (AMDF) method. A set of measurements was made on the pitch contours to quantify the various types of errors which occur in each of the above methods. Included among the error measurements were the average and standard deviation of the error in pitch period during voiced regions, the number of gross errors in the pitch period, and the average number of voiced-unvoiced classification errors. For each of the error measurements, the individual pitch detectors could be rank ordered as a measure of their relative performance as a function of recording condition, and pitch range of the various speakers. Performance scores are presented for each of the seven pitch detectors based on each of the categories of error.

778 citations


"Pitch tracking in reverberant envir..." refers background in this paper

  • [...]

Journal ArticleDOI

[...]

TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.
Abstract: Automatic speech recognition by computers can provide the most natural and efficient method of communication between humans and computers. While in recent years high performance speech recognition systems are beginning to emerge from research institutions, scientists unequivocally agree that the deployment of speech recognition systems into realistic operating environments will require many hours of speech data to help us model the inherent variability in the speech signal. This paper describes the experiences of researchers at MIT in the collection of two large speech databases which have somewhat complementary objectives. The timit database was designed to be task and speaker-independent, and is suitable for general acoustic-phonetic research. The voyager database, on the other hand, was intended for development and evaluation of a system which incorporates both speech and natural language processing. This database is particularly valuable as a source of spontaneous utterances elicited in a realistic goal-oriented environment.

499 citations


"Pitch tracking in reverberant envir..." refers background or methods in this paper

  • [...]

  • [...]

  • [...]

  • [...]