Author

K.S.R. Murty

Bio: K.S.R. Murty is an academic researcher from Indian Institutes of Technology. The author has contributed to research in topics: Noise & Epoch (reference date). The author has an hindex of 1, co-authored 1 publications receiving 530 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Epoch Extraction From Speech Signals

[...]

K.S.R. Murty¹, B. Yegnanarayana²•Institutions (2)

Indian Institutes of Technology¹, International Institute of Information Technology, Hyderabad²

01 Nov 2008-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The interesting part of the results is that the epoch extraction by the proposed method seems to be robust against degradations like white noise, babble, high-frequency channel, and vehicle noise.

...read moreread less

Abstract: Epoch is the instant of significant excitation of the vocal-tract system during production of speech. For most voiced speech, the most significant excitation takes place around the instant of glottal closure. Extraction of epochs from speech is a challenging task due to time-varying characteristics of the source and the system. Most epoch extraction methods attempt to remove the characteristics of the vocal-tract system, in order to emphasize the excitation characteristics in the residual. The performance of such methods depends critically on our ability to model the system. In this paper, we propose a method for epoch extraction which does not depend critically on characteristics of the time-varying vocal-tract system. The method exploits the nature of impulse-like excitation. The proposed zero resonance frequency filter output brings out the epoch locations with high accuracy and reliability. The performance of the method is demonstrated using CMU-Arctic database using the epoch information from the electroglottograph as reference. The proposed method performs significantly better than the other methods currently available for epoch extraction. The interesting part of the results is that the epoch extraction by the proposed method seems to be robust against degradations like white noise, babble, high-frequency channel, and vehicle noise.

...read moreread less

569 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review

[...]

Thomas Drugman¹, Mark R. P. Thomas², Jon Gudnason³, Patrick A. Naylor², Thierry Dutoit¹ - Show less +1 more•Institutions (3)

University of Mons¹, Imperial College London², Reykjavík University³

01 Mar 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this paper, five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers.

...read moreread less

Abstract: The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the glottal closure instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA). The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy. Their robustness to additive noise and to reverberation is also assessed. A further contribution of the paper is the evaluation of their performance on a concrete application of speech processing: the causal-anticausal decomposition of speech. It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy. ZFR and SEDREAMS also show a superior robustness to additive noise and reverberation.

...read moreread less

241 citations

Journal Article•DOI•

Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals

[...]

B. Yegnanarayana, K.S.R. Murty¹•Institutions (1)

Indian Institute of Technology Madras¹

01 May 2009-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The accuracy of the fundamental frequency estimation by the proposed method is comparable or even better than many existing methods and is also robust against rapid variation of the pitch period or vocal-tract changes.

...read moreread less

Abstract: Exploiting the impulse-like nature of excitation in the sequence of glottal cycles, a method is proposed to derive the instantaneous fundamental frequency from speech signals. The method involves passing the speech signal through two ideal resonators located at zero frequency. A filtered signal is derived from the output of the resonators by subtracting the local mean computed over an interval corresponding to the average pitch period. The positive zero crossings in the filtered signal correspond to the locations of the strong impulses in each glottal cycle. Then the instantaneous fundamental frequency is obtained by taking the reciprocal of the interval between successive positive zero crossings. Due to filtering by zero-frequency resonator, the effects of noise and vocal-tract variations are practically eliminated. For the same reason, the method is also robust to degradation in speech due to additive noise. The accuracy of the fundamental frequency estimation by the proposed method is comparable or even better than many existing methods. Moreover, the proposed method is also robust against rapid variation of the pitch period or vocal-tract changes. The method works well even when the glottal cycles are not periodic or when the speech signals are not correlated in successive glottal cycles.

...read moreread less

201 citations

Journal Article•DOI•

Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications

[...]

Paavo Alku¹•Institutions (1)

Aalto University¹

22 Nov 2011-Sadhana-academy Proceedings in Engineering Sciences

TL;DR: An era spanning five decades during which this topic has been under development is examined, including the estimation methods of the glottal source, the parameterization techniques that have been developed to express the estimatedglottal excitations in numerical forms, and the application areas of GIF.

...read moreread less

Abstract: Glottal inverse filtering (GIF) refers to methods of estimating the source of voiced speech, the glottal volume velocity waveform. GIF is based on the idea of inversion, in which the effects of the vocal tract and lip radiation are cancelled from the output of the voice production mechanism, the speech signal. This article provides a review on GIF research by examining an era spanning five decades during which this topic has been under development. The topic is handled from three main perspectives: the estimation methods of the glottal source, the parameterization techniques that have been developed to express the estimated glottal excitations in numerical forms, and the application areas of GIF. Finally, the strengths and limitations of the GIF approach are discussed.

...read moreread less

186 citations

Proceedings Article•

Glottal closure and opening instant detection from speech signals.

[...]

Thomas Drugman¹, Thierry Dutoit¹•Institutions (1)

Faculté polytechnique de Mons¹

01 Jan 2009

TL;DR: In this paper, a new procedure was proposed to detect Glottal Closure and Opening Instants (GCIs and GOIs) directly from speech waveforms, which is divided into two successive steps.

...read moreread less

Abstract: This paper proposes a new procedure to detect Glottal Closure and Opening Instants (GCIs and GOIs) directly from speech waveforms. The procedure is divided into two successive steps. First a mean-based signal is computed, and intervals where speech events are expected to occur are extracted from it. Secondly, at each interval a precise position of the speech event is assigned by locating a discontinuity in the Linear Prediction residual. The proposed method is compared to the DYPSA algorithm on the CMU ARCTIC database. A significant improvement as well as a better noise robustness are reported. Besides, results of GOI identification accuracy are promising for the glottal source characterization.

...read moreread less

175 citations

Journal Article•DOI•

Emotion recognition from speech using global and local prosodic features

[...]

K. Sreenivasa Rao¹, Shashidhar G. Koolagudi¹, Ramu Reddy Vempada¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Jun 2013-International Journal of Speech Technology

TL;DR: The results indicate that, the recognition performance using local Prosodic features is better compared to the performance of global prosodic features.

...read moreread less

Abstract: In this paper, global and local prosodic features extracted from sentence, word and syllables are proposed for speech emotion or affect recognition. In this work, duration, pitch, and energy values are used to represent the prosodic information, for recognizing the emotions from speech. Global prosodic features represent the gross statistics such as mean, minimum, maximum, standard deviation, and slope of the prosodic contours. Local prosodic features represent the temporal dynamics in the prosody. In this work, global and local prosodic features are analyzed separately and in combination at different levels for the recognition of emotions. In this study, we have also explored the words and syllables at different positions (initial, middle, and final) separately, to analyze their contribution towards the recognition of emotions. In this paper, all the studies are carried out using simulated Telugu emotion speech corpus (IITKGP-SESC). These results are compared with the results of internationally known Berlin emotion speech corpus (Emo-DB). Support vector machines are used to develop the emotion recognition models. The results indicate that, the recognition performance using local prosodic features is better compared to the performance of global prosodic features. Words in the final position of the sentences, syllables in the final position of the words exhibit more emotion discriminative information compared to the words and syllables present in the other positions.

...read moreread less

149 citations

Collapse