scispace - formally typeset
Search or ask a question
Author

Petr Zelinka

Bio: Petr Zelinka is an academic researcher from Brno University of Technology. The author has contributed to research in topics: Statistical model & Hidden Markov model. The author has an hindex of 5, co-authored 9 publications receiving 87 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the system's robustness are tested.

51 citations

Journal ArticleDOI
TL;DR: A novel approach to nonstationary acoustical noise modeling via a set of hierarchically tied hidden Markov models in a classification tree structure is described, which allows detailed description of non stationary ambient acoustICAL noise while maintaining low computational costs during recognition.
Abstract: Noise robustness is a key issue in successful deployment of automatic speech recognition systems in demanding environments such as hospital operating rooms. Perhaps the most successful way to overcome the additive noise obstacle is to employ a model adaptation scheme built around a set of dedicated clean speech and noise-only statistical models. Existing recognizer designs generally rely on relatively simple noise models, as more detailed ones would increase computational demands significantly. Simple models are, however, unable to provide accurate characterization of highly nonstationary noise present in real-world noisy facilities and thereby provide only limited reduction in error rate of the recognizer. The present article describes a novel approach to nonstationary acoustical noise modeling via a set of hierarchically tied hidden Markov models in a classification tree structure. Proposed statistical structure allows detailed description of nonstationary ambient acoustical noise while maintaining low computational costs during recognition. Modeling performance of the proposed construction is verified on a real background noise recorded during a neurosurgery in a hospital operating room.

14 citations

Journal ArticleDOI
31 May 2011
TL;DR: Experimental results show that analysis of glottal excitation appears to be a useful approach to provide evidence of alcohol intoxication of over 196o, and a new collection of Czech alcoholized speech consisting of phonetically identical speech data spoken in both sober and intoxicated state was created.
Abstract: A significant part of information carried in speech signal refers to the speaker. This paper deals with investigating alcohol intoxication based on analyzing recorded speech signal. Speech changes resulting from alcohol intoxication were investigated in the waveform of glottal pulses estimated from speech by applying the Iterative Adaptive Inverse Filtering (IAIF). Experimental results show that analysis of glottal excitation appears to be a useful approach to provide evidence of alcohol intoxication of over 196o. At this alcohol level, the associated negative events influence professional performance and may involve fatal accidents in some cases. Via analyzing the speech signal, the speaker could be automatically monitored without their active co-operation. For use in our experiments, a new collection of Czech alcoholized speech consisting of phonetically identical speech data spoken in both sober and intoxicated state was created. http://dx.doi.org/10.5755/j01.itc.40.2.429

12 citations

Proceedings ArticleDOI
07 Oct 2010
TL;DR: Severe impact of vocal effort variability on the accuracy of a speaker-dependent word recognizer is presented and an efficient remedial measure using multiple-model framework paired with accurate speech mode detector is proposed.
Abstract: This paper describes an approach for enhancing the robustness of isolated words recognizer by extending its flexibility in the domain of speaker's variable vocal effort level. An analysis of spectral properties of spoken vowels in four various speaking modes (whispering, soft, normal, and loud) confirm consistent spectral tilt changes. Severe impact of vocal effort variability on the accuracy of a speaker-dependent word recognizer is presented and an efficient remedial measure using multiple-model framework paired with accurate speech mode detector is proposed.

10 citations

Proceedings ArticleDOI
01 Jan 2010

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The proposed Maxima Dispersion Quotient parameter is designed to quantify the extent of this dispersion and is shown to compare favorably to existing voice quality parameters, particularly for the analysis of continuous speech.
Abstract: This paper proposes a new parameter, the Maxima Dispersion Quotient (MDQ), for differentiating breathy to tense voice. Maxima derived following wavelet decomposition are often used for detecting edges in image processing, where locations of these maxima organize in the vicinity of the edge location. Similarly for tense voice, which typically displays sharp glottal closing characteristics, maxima following wavelet analysis are organized in the vicinity of the glottal closure instant (GCI). Contrastingly, as the phonation type tends away from tense voice towards a breathier phonation it is observed that the maxima become increasingly dispersed. The MDQ parameter is designed to quantify the extent of this dispersion and is shown to compare favorably to existing voice quality parameters, particularly for the analysis of continuous speech. Also, classification experiments reveal a significant improvement in the detection of the voice qualities when MDQ is included as an input to the classifier. Finally, MDQ is shown to be robust to additive noise down to a Signal-to-Noise Ratio of 10 dB.

87 citations

Journal ArticleDOI
TL;DR: It is shown that the closed phase behavior of the excitation at different loudness levels can be seen in the temporal variation of spectral energy in the low frequency (LF) (<400 Hz) region.
Abstract: In this paper characteristics of speech produced at different loudness levels are analyzed in terms of changes in the glottal excitation. Four loudness levels are considered in this study, namely, soft, normal, loud, and shout. The distinct changes in the excitation of the shout signal are analyzed using electroglottograph signals. The open and closed phases of the glottal vibration are distinctly different for shout signals, in comparison with those for normal speech. It is generally difficult to derive the glottal pulse information from the speech signal due to limitations in inverse filtering. Hence, the effects of changes in the excitation are examined by analyzing the speech signal using methods that can capture the temporal variations of the spectral features. In particular, the recently proposed methods of zero-frequency filtering and zero-time liftering are used in this analysis. It is shown that the closed phase behavior of the excitation at different loudness levels can be seen in the temporal variation of spectral energy in the low frequency (LF) (<400 Hz) region. The ratio of the LF to high frequency energy clearly discriminates the speech produced at different loudness levels. These distinctions in the excitation features are also observed in different vowel contexts and across several speakers.

61 citations

Journal ArticleDOI
TL;DR: The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the system's robustness are tested.

51 citations

Journal ArticleDOI
TL;DR: By fusing participants' systems, it is shown that binary classification of alcoholisation and sleepiness from short-term observations, i.e., single utterances, can both reach over 72% accuracy on unseen test data; and it is demonstrated that these medium-term states can be recognised more robustly by fusing short- term classifiers along the time axis.

50 citations

Journal ArticleDOI
TL;DR: The evaluation results show that the synthesized voices with varying vocal effort are rated similarly to their natural counterparts both in terms of intelligibility and suitability.

40 citations