Home
/
Authors
/
Petr Zelinka

Author

Petr Zelinka

Bio: Petr Zelinka is an academic researcher from Brno University of Technology. The author has contributed to research in topics: Statistical model & Hidden Markov model. The author has an hindex of 5, co-authored 9 publications receiving 87 citations.

Topics: Statistical model, Hidden Markov model, Signal, Vocal effort, Whispering ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Impact of vocal effort variability on automatic speech recognition

[...]

Petr Zelinka¹, Milan Sigmund¹, Jiri Schimmel¹•Institutions (1)

Brno University of Technology¹

01 Jul 2012-Speech Communication

TL;DR: The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the system's robustness are tested.

...read moreread less

51 citations

Journal Article•DOI•

Hierarchical classification tree modeling of nonstationary noise for robust speech recognition

[...]

Petr Zelinka¹, Milan Sigmund¹•Institutions (1)

Brno University of Technology¹

07 Oct 2010-Information Technology and Control

TL;DR: A novel approach to nonstationary acoustical noise modeling via a set of hierarchically tied hidden Markov models in a classification tree structure is described, which allows detailed description of non stationary ambient acoustICAL noise while maintaining low computational costs during recognition.

...read moreread less

Abstract: Noise robustness is a key issue in successful deployment of automatic speech recognition systems in demanding environments such as hospital operating rooms. Perhaps the most successful way to overcome the additive noise obstacle is to employ a model adaptation scheme built around a set of dedicated clean speech and noise-only statistical models. Existing recognizer designs generally rely on relatively simple noise models, as more detailed ones would increase computational demands significantly. Simple models are, however, unable to provide accurate characterization of highly nonstationary noise present in real-world noisy facilities and thereby provide only limited reduction in error rate of the recognizer. The present article describes a novel approach to nonstationary acoustical noise modeling via a set of hierarchically tied hidden Markov models in a classification tree structure. Proposed statistical structure allows detailed description of nonstationary ambient acoustical noise while maintaining low computational costs during recognition. Modeling performance of the proposed construction is verified on a real background noise recorded during a neurosurgery in a hospital operating room.

...read moreread less

14 citations

Journal Article•DOI•

Analysis of voiced speech excitation due to alcohol intoxication

[...]

Milan Sigmund¹, Petr Zelinka¹•Institutions (1)

Brno University of Technology¹

31 May 2011

TL;DR: Experimental results show that analysis of glottal excitation appears to be a useful approach to provide evidence of alcohol intoxication of over 196o, and a new collection of Czech alcoholized speech consisting of phonetically identical speech data spoken in both sober and intoxicated state was created.

...read moreread less

Abstract: A significant part of information carried in speech signal refers to the speaker. This paper deals with investigating alcohol intoxication based on analyzing recorded speech signal. Speech changes resulting from alcohol intoxication were investigated in the waveform of glottal pulses estimated from speech by applying the Iterative Adaptive Inverse Filtering (IAIF). Experimental results show that analysis of glottal excitation appears to be a useful approach to provide evidence of alcohol intoxication of over 196o. At this alcohol level, the associated negative events influence professional performance and may involve fatal accidents in some cases. Via analyzing the speech signal, the speaker could be automatically monitored without their active co-operation. For use in our experiments, a new collection of Czech alcoholized speech consisting of phonetically identical speech data spoken in both sober and intoxicated state was created. http://dx.doi.org/10.5755/j01.itc.40.2.429

...read moreread less

12 citations

Proceedings Article•DOI•

Automatic vocal effort detection for reliable speech recognition

[...]

Petr Zelinka¹, Milan Sigmund¹•Institutions (1)

Brno University of Technology¹

07 Oct 2010

TL;DR: Severe impact of vocal effort variability on the accuracy of a speaker-dependent word recognizer is presented and an efficient remedial measure using multiple-model framework paired with accurate speech mode detector is proposed.

...read moreread less

Abstract: This paper describes an approach for enhancing the robustness of isolated words recognizer by extending its flexibility in the domain of speaker's variable vocal effort level. An analysis of spectral properties of spoken vowels in four various speaking modes (whispering, soft, normal, and loud) confirm consistent spectral tilt changes. Severe impact of vocal effort variability on the accuracy of a speaker-dependent word recognizer is presented and an efficient remedial measure using multiple-model framework paired with accurate speech mode detector is proposed.

...read moreread less

10 citations

Proceedings Article•DOI•

Detection of Alcohol in Speech Signal using LF Model

[...]

Milan Sigmund, Ales Prokes, Petr Zelinka

01 Jan 2010

5 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination

[...]

John Kane¹, Christer Gobl¹•Institutions (1)

Trinity College, Dublin¹

01 Jun 2013-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The proposed Maxima Dispersion Quotient parameter is designed to quantify the extent of this dispersion and is shown to compare favorably to existing voice quality parameters, particularly for the analysis of continuous speech.

...read moreread less

Abstract: This paper proposes a new parameter, the Maxima Dispersion Quotient (MDQ), for differentiating breathy to tense voice. Maxima derived following wavelet decomposition are often used for detecting edges in image processing, where locations of these maxima organize in the vicinity of the edge location. Similarly for tense voice, which typically displays sharp glottal closing characteristics, maxima following wavelet analysis are organized in the vicinity of the glottal closure instant (GCI). Contrastingly, as the phonation type tends away from tense voice towards a breathier phonation it is observed that the maxima become increasingly dispersed. The MDQ parameter is designed to quantify the extent of this dispersion and is shown to compare favorably to existing voice quality parameters, particularly for the analysis of continuous speech. Also, classification experiments reveal a significant improvement in the detection of the voice qualities when MDQ is included as an input to the classifier. Finally, MDQ is shown to be robust to additive noise down to a Signal-to-Noise Ratio of 10 dB.

...read moreread less

87 citations

Journal Article•DOI•

Effect of glottal dynamics in the production of shouted speech

[...]

Vinay Kumar Mittal¹, B. Yegnanarayana•Institutions (1)

International Institute of Information Technology, Hyderabad¹

06 May 2013-Journal of the Acoustical Society of America

TL;DR: It is shown that the closed phase behavior of the excitation at different loudness levels can be seen in the temporal variation of spectral energy in the low frequency (LF) (<400 Hz) region.

...read moreread less

Abstract: In this paper characteristics of speech produced at different loudness levels are analyzed in terms of changes in the glottal excitation. Four loudness levels are considered in this study, namely, soft, normal, loud, and shout. The distinct changes in the excitation of the shout signal are analyzed using electroglottograph signals. The open and closed phases of the glottal vibration are distinctly different for shout signals, in comparison with those for normal speech. It is generally difficult to derive the glottal pulse information from the speech signal due to limitations in inverse filtering. Hence, the effects of changes in the excitation are examined by analyzing the speech signal using methods that can capture the temporal variations of the spectral features. In particular, the recently proposed methods of zero-frequency filtering and zero-time liftering are used in this analysis. It is shown that the closed phase behavior of the excitation at different loudness levels can be seen in the temporal variation of spectral energy in the low frequency (LF) (<400 Hz) region. The ratio of the LF to high frequency energy clearly discriminates the speech produced at different loudness levels. These distinctions in the excitation features are also observed in different vowel contexts and across several speakers.

...read moreread less

61 citations

Journal Article•DOI•

Impact of vocal effort variability on automatic speech recognition

[...]

Petr Zelinka¹, Milan Sigmund¹, Jiri Schimmel¹•Institutions (1)

Brno University of Technology¹

01 Jul 2012-Speech Communication

TL;DR: The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the system's robustness are tested.

...read moreread less

51 citations

Journal Article•DOI•

Medium-term speaker states-A review on intoxication, sleepiness and the first challenge

[...]

Björn Schuller¹, Stefan Steidl², Anton Batliner³, Florian Schiel⁴, Jarek Krajewski⁵, Felix Weninger³, Florian Eyben³ - Show less +3 more•Institutions (5)

Joanneum Research¹, University of Erlangen-Nuremberg², Technische Universität München³, Ludwig Maximilian University of Munich⁴, University of Würzburg⁵

01 Mar 2014-Computer Speech & Language

TL;DR: By fusing participants' systems, it is shown that binary classification of alcoholisation and sleepiness from short-term observations, i.e., single utterances, can both reach over 72% accuracy on unseen test data; and it is demonstrated that these medium-term states can be recognised more robustly by fusing short- term classifiers along the time axis.

...read moreread less

50 citations

Journal Article•DOI•

Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise

[...]

Tuomo Raitio¹, Antti Suni², Martti Vainio², Paavo Alku¹•Institutions (2)

Aalto University¹, University of Helsinki²

01 Mar 2014-Computer Speech & Language

TL;DR: The evaluation results show that the synthesized voices with varying vocal effort are rated similarly to their natural counterparts both in terms of intelligibility and suitability.

...read moreread less

40 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Collapse