scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Implementation of pitch detection algorithms for pathological voices

01 Aug 2016-Vol. 2016, pp 1-5
TL;DR: The paper attempts to identify and classify the pathological voices from the normal voices using k-NN algorithm and relative performance comparison is done in terms of accuracy and speed of operation of the above mentioned algorithms.
Abstract: Pitch detection algorithm is important feature in the speech processing algorithms. It can be used for speaker recognition, speech instruction of the hearing impaired ones, vocoder systems, discriminate normal and pathological voice etc. Hence robust and accurate determination of pitch is necessary. The paper involves determining pitch of the speech signal by the pitch detection algorithm — 1) Autocorrelation method, 2) Cepstrum Method, 3) Simplified Inverse Filtering Method 4) Data Reduction Method. A measurement is made on the pitch contours to explore how voice/acoustic parameter like fundamental frequency can determine person's health. The paper attempts to identify and classify the pathological voices from the normal voices using k-NN algorithm. The relative performance comparison is done in terms of accuracy and speed of operation of the above mentioned algorithms.
Citations
More filters
Journal ArticleDOI
01 Oct 1980

1,565 citations

Proceedings ArticleDOI
06 Mar 2020
TL;DR: An elaborated literature survey on both traditional and deep learning-based methods of speaker recognition and voice comparison is focused on, which would provide substantial input to beginners and researchers for understanding the domain of voice recognition andVoice comparison.
Abstract: Voice comparison is a variant of speaker recognition or voice recognition. Voice comparison plays a significant role in the forensic science field and security systems. Precise voice comparison is a challenging problem. Traditionally, different classification and comparison models were used by the researchers to solve the speaker recognition and the voice comparison, respectively but deep learning is gaining popularity because of its strength in accuracy when trained with large amounts of data. This paper focuses on an elaborated literature survey on both traditional and deep learning-based methods of speaker recognition and voice comparison. This paper also discusses publicly available datasets that are used for speaker recognition and voice comparison by researchers. This concise paper would provide substantial input to beginners and researchers for understanding the domain of voice recognition and voice comparison.

42 citations


Cites background from "Implementation of pitch detection a..."

  • ...Voice comparison [1] is a difficult problem to solve because the voice of a person may change due to the emotion, age-gap, and throat infection [2]....

    [...]

Journal ArticleDOI
Tao Zhang1, Yangyang Shao1, Yaqin Wu1, Zhibo Pang, Ganjun Liu1 
TL;DR: A multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowel /a/, /i/ and /u/ and achieved the repair of these vowels successfully.
Abstract: Individuals, such as voice-related professionals, elderly people and smokers, are increasingly suffering from voice disorder, which implies the importance of pathological voice repair. Previous work on pathological voice repair only concerned about sustained vowel /a/, but multiple vowels repair is still challenging due to the unstable extraction of pitch and the unsatisfactory reconstruction of formant. In this paper, a multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowels /a/, /i/ and /u/ and achieved the repair of these vowels successfully. Considering deep neural network as a classifier, a voice recognition is performed to classify the normal and pathological voices. Wavelet Transform and Hilbert-Huang Transform are applied for pitch extraction. Based on Line Spectrum Pair (LSP) feature, the formant is reconstructed. The final repaired voice is obtained by synthesizing the pitch and the formant. The proposed method is validated on Saarbrucken Voice Database (SVD) database. The achieved improvements of three metrics, Segmental Signal-to-Noise Ratio, LSP distance measure and Mel cepstral distance measure, are respectively 45.87%, 50.37% and 15.56%. Besides, an intuitive analysis based on spectrogram has been done and a prominent repair effect has been achieved.

9 citations


Cites background from "Implementation of pitch detection a..."

  • ...Voice disorder has been affecting various social categories accounting for around 25% of the world population [1], such as voice-related professionals like teachers [2], elderly people, smokers, patients in respiratory, nasal and larynx diseases [3] and so on....

    [...]

Journal ArticleDOI
TL;DR: In this article , the authors present an algorithm that determines the parameters using the Hanning window with a length of six glottal cycles, which allows extraction of the parameters close to the reference values.
Abstract: The harmonic parameters Autocorrelation, Harmonic to Noise Ratio (HNR), and Noise to Harmonic Ratio are related to vocal quality, providing alternative measures of the harmonic energy of a speech signal. They will be used as input resources for an intelligent medical decision support system for the diagnosis of speech pathology. An efficient algorithm is important when implementing it on low-power devices. This article presents an algorithm that determines these parameters by optimizing the window type and length. The method used comparatively analyzes the values of the algorithm, with different combinations of window and size and a reference value. Hamming, Hanning, and Blackman windows with lengths of 3, 6, 12, and 24 glottal cycles and various sampling frequencies were investigated. As a result, we present an efficient algorithm that determines the parameters using the Hanning window with a length of six glottal cycles. The mean difference of Autocorrelation is less than 0.004, and that of HNR is less than 0.42 dB. In conclusion, this algorithm allows extraction of the parameters close to the reference values. In Autocorrelation, there are no significant effects of sampling frequency. However, it should be used cautiously for HNR with lower sampling rates.

2 citations

Journal ArticleDOI
01 Jan 2020
TL;DR: Several methods for pitch frequency estimation are investigated and compared on clear and reverberant male and female speech signals to select the one that is not affected so much by the reverberation effect.
Abstract: Reverberation is one of the effects that occur regularly in closed room due to multiple reflections. This paper investigates the result of reverberation on both male and female speech signals. This effect is reflected in pitch frequency of speech signals. This parameter is important as it is usually used for speaker identification. Hence, several methods for pitch frequency estimation are investigated and compared on clear and reverberant male and female speech signals to select the one that is not affected so much by the reverberation effect.

Cites methods from "Implementation of pitch detection a..."

  • ...Cepstral coefficients are calculated as shown in(6)[9]: c τ = F log( F x[n] 2 (6) where F denotes the inverse Fourier transform and x[n] is the discrete signal and F{x[n]} 2 is the power spectrum estimatedof the signal....

    [...]

References
More filters
Book
05 Sep 1978
TL;DR: This paper presents a meta-modelling framework for digital Speech Processing for Man-Machine Communication by Voice that automates the very labor-intensive and therefore time-heavy and expensive process of encoding and decoding speech.
Abstract: 1. Introduction. 2. Fundamentals of Digital Speech Processing. 3. Digital Models for the Speech Signal. 4. Time-Domain Models for Speech Processing. 5. Digital Representation of the Speech Waveform. 6. Short-Time Fourier Analysis. 7. Homomorphic Speech Processing. 8. Linear Predictive Coding of Speech. 9. Digital Speech Processing for Man-Machine Communication by Voice.

3,103 citations

Journal ArticleDOI
01 Oct 1980

1,565 citations


"Implementation of pitch detection a..." refers background in this paper

  • ...This is due to the fact that the shape and dimension of the above mentioned parameters varies from person to person [2]....

    [...]

Book
01 Jan 2001
TL;DR: This chapter discusses the Discrete-Time Speech Signal Processing Framework, a model based on the FBS Method, and its applications in Speech Communication Pathway and Homomorphic Signal Processing.
Abstract: (NOTE: Each chapter begins with an introduction and concludes with a Summary, Exercises and Bibliography.) 1. Introduction. Discrete-Time Speech Signal Processing. The Speech Communication Pathway. Analysis/Synthesis Based on Speech Production and Perception. Applications. Outline of Book. 2. A Discrete-Time Signal Processing Framework. Discrete-Time Signals. Discrete-Time Systems. Discrete-Time Fourier Transform. Uncertainty Principle. z-Transform. LTI Systems in the Frequency Domain. Properties of LTI Systems. Time-Varying Systems. Discrete-Fourier Transform. Conversion of Continuous Signals and Systems to Discrete Time. 3. Production and Classification of Speech Sounds. Anatomy and Physiology of Speech Production. Spectrographic Analysis of Speech. Categorization of Speech Sounds. Prosody: The Melody of Speech. Speech Perception. 4. Acoustics of Speech Production. Physics of Sound. Uniform Tube Model. A Discrete-Time Model Based on Tube Concatenation. Vocal Fold/Vocal Tract Interaction. 5. Analysis and Synthesis of Pole-Zero Speech Models. Time-Dependent Processing. All-Pole Modeling of Deterministic Signals. Linear Prediction Analysis of Stochastic Speech Sounds. Criterion of "Goodness". Synthesis Based on All-Pole Modeling. Pole-Zero Estimation. Decomposition of the Glottal Flow Derivative. Appendix 5.A: Properties of Stochastic Processes. Random Processes. Ensemble Averages. Stationary Random Process. Time Averages. Power Density Spectrum. Appendix 5.B: Derivation of the Lattice Filter in Linear Prediction Analysis. 6. Homomorphic Signal Processing. Concept. Homomorphic Systems for Convolution. Complex Cepstrum of Speech-Like Sequences. Spectral Root Homomorphic Filtering. Short-Time Homomorphic Analysis of Periodic Sequences. Short-Time Speech Analysis. Analysis/Synthesis Structures. Contrasting Linear Prediction and Homomorphic Filtering. 7. Short-Time Fourier Transform Analysis and Synthesis. Short-Time Analysis. Short-Time Synthesis. Short-Time Fourier Transform Magnitude. Signal Estimation from the Modified STFT or STFTM. Time-Scale Modification and Enhancement of Speech. Appendix 7.A: FBS Method with Multiplicative Modification. 8. Filter-Bank Analysis/Synthesis. Revisiting the FBS Method. Phase Vocoder. Phase Coherence in the Phase Vocoder. Constant-Q Analysis/Synthesis. Auditory Modeling. 9. Sinusoidal Analysis/Synthesis. Sinusoidal Speech Model. Estimation of Sinewave Parameters. Synthesis. Source/Filter Phase Model. Additive Deterministic-Stochastic Model. Appendix 9.A: Derivation of the Sinewave Model. Appendix 9.B: Derivation of Optimal Cubic Phase Parameters. 10. Frequency-Domain Pitch Estimation. A Correlation-Based Pitch Estimator. Pitch Estimation Based on a "Comb Filter<170. Pitch Estimation Based on a Harmonic Sinewave Model. Glottal Pulse Onset Estimation. Multi-Band Pitch and Voicing Estimation. 11. Nonlinear Measurement and Modeling Techniques. The STFT and Wavelet Transform Revisited. Bilinear Time-Frequency Distributions. Aeroacoustic Flow in the Vocal Tract. Instantaneous Teager Energy Operator. 12. Speech Coding. Statistical Models of Speech. Scaler Quantization. Vector Quantization (VQ). Frequency-Domain Coding. Model-Based Coding. LPC Residual Coding. 13. Speech Enhancement. Introduction. Preliminaries. Wiener Filtering. Model-Based Processing. Enhancement Based on Auditory Masking. Appendix 13.A: Stochastic-Theoretic parameter Estimation. 14. Speaker Recognition. Introduction. Spectral Features for Speaker Recognition. Speaker Recognition Algorithms. Non-Spectral Features in Speaker Recognition. Signal Enhancement for the Mismatched Condition. Speaker Recognition from Coded Speech. Appendix 14.A: Expectation-Maximization (EM) Estimation. Glossary.Speech Signal Processing.Units.Databases.Index.About the Author.

984 citations


"Implementation of pitch detection a..." refers background in this paper

  • ...The modified F˳ is corresponds to vocal chords and those concerning modification of vocal chords [1]....

    [...]

Journal ArticleDOI
TL;DR: A comparative performance study of seven pitch detection algorithms was conducted, consisting of eight utterances spoken by three males, three females, and one child, to assess their relative performance as a function of recording condition, and pitch range of the various speakers.
Abstract: A comparative performance study of seven pitch detection algorithms was conducted. A speech data base, consisting of eight utterances spoken by three males, three females, and one child was constructed. Telephone, close talking microphone, and wideband recordings were made of each of the utterances. For each of the utterances in the data base; a "standard" pitch contour was semiautomatically measured using a highly sophisticated interactive pitch detection program. The "standard" pitch contour was then compared with the pitch contour that was obtained from each of the seven programmed pitch detectors. The algorithms used in this study were 1) a center clipping, infinite-peak clipping, modified autocorrelation method (AUTOC), 2) the cepstral method (CEP), 3) the simplified inverse filtering technique (SIFT) method, 4) the parallel processing time-domain method (PPROC), 5) the data reduction method (DARD), 6) a spectral flattening linear predictive coding (LPC) method, and 7) the average magnitude difference function (AMDF) method. A set of measurements was made on the pitch contours to quantify the various types of errors which occur in each of the above methods. Included among the error measurements were the average and standard deviation of the error in pitch period during voiced regions, the number of gross errors in the pitch period, and the average number of voiced-unvoiced classification errors. For each of the error measurements, the individual pitch detectors could be rank ordered as a measure of their relative performance as a function of recording condition, and pitch range of the various speakers. Performance scores are presented for each of the seven pitch detectors based on each of the categories of error.

793 citations


"Implementation of pitch detection a..." refers methods in this paper

  • ...We apply error correction method to remove discontinuity of pitch markers [10]....

    [...]

  • ...Thus, we calculate pitch for every sample [10]....

    [...]

Journal ArticleDOI
Lawrence R. Rabiner1
TL;DR: Several types of (nonlinear) preprocessing which can be used to effectively spectrally flatten the speech signal are presented and an algorithm for adaptively choosing a frame size for an autocorrelation pitch analysis is discussed.
Abstract: One of the most time honored methods of detecting pitch is to use some type of autocorrelation analysis on speech which has been appropriately preprocessed. The goal of the speech preprocessing in most systems is to whiten, or spectrally flatten, the signal so as to eliminate the effects of the vocal tract spectrum on the detailed shape of the resulting autocorrelation function. The purpose of this paper is to present some results on several types of (nonlinear) preprocessing which can be used to effectively spectrally flatten the speech signal The types of nonlinearities which are considered are classified by a non-linear input-output quantizer characteristic. By appropriate adjustment of the quantizer threshold levels, both the ordinary (linear) autocorrelation analysis, and the center clipping-peak clipping autocorrelation of Dubnowski et al. [1] can be obtained. Results are presented to demonstrate the degree of spectrum flattening obtained using these methods. Each of the proposed methods was tested on several of the utterances used in a recent pitch detector comparison study by Rabiner et al. [2] Results of this comparison are included in this paper. One final topic which is discussed in this paper is an algorithm for adaptively choosing a frame size for an autocorrelation pitch analysis.

572 citations


"Implementation of pitch detection a..." refers methods in this paper

  • ...It is time domain method based on the center-clipping method [9]....

    [...]