scispace - formally typeset
Search or ask a question

Showing papers by "Tobias Bocklet published in 2012"


Proceedings ArticleDOI
09 Sep 2012
TL;DR: The EPFL-CONF-174360 data indicate that speaker Traits and Likability are influenced by the environment and the speaker’s personality in terms of paralinguistics and personality.
Abstract: Keywords: Computational Paralinguistics ; Speaker Traits ; Personality ; Likability ; Pathology Reference EPFL-CONF-174360 Record created on 2012-01-23, modified on 2017-05-10

240 citations


Journal ArticleDOI
TL;DR: This work introduces a purely data-driven system for the acoustic analysis of pathologic voices based on recordings of a standard text and assumes that for selected evaluation criteria, the system can serve as a validated objective support for acoustic voice and speech analysis.

35 citations


Journal ArticleDOI
TL;DR: A Pattern Recognition Lab, Technical Faculty, Friedrich-Alexander-University Erlangen-Nuremberg, and the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital, Ludwig-Maximilians-University Munich show the results of a study on pattern recognition and its applications in head and neck surgery.

22 citations


Proceedings ArticleDOI
25 Mar 2012
TL;DR: Experiments on the RM and WSJ corpora show that while a classical semicontinuous system does not perform as well as a continuous one, multiple-codebook semi-continuous systems can perform better, particular when using full-covariance Gaussians.
Abstract: In the past decade, semi-continuous hidden Markov models (SCHMMs) have not attracted much attention in the speech recognition community. Growing amounts of training data and increasing sophistication of model estimation led to the impression that continuous HMMs are the best choice of acoustic model. However, recent work on recognition of under-resourced languages faces the same old problem of estimating a large number of parameters from limited amounts of transcribed speech. This has led to a renewed interest in methods of reducing the number of parameters while maintaining or extending the modeling capabilities of continuous models. In this work, we compare classic and multiple-codebook semi-continuous models using diagonal and full covariance matrices with continuous HMMs and subspace Gaussian mixture models. Experiments on the RM and WSJ corpora show that while a classical semicontinuous system does not perform as well as a continuous one, multiple-codebook semi-continuous systems can perform better, particular when using full-covariance Gaussians.

18 citations


Journal Article
TL;DR: Patients benefit from the fabrication of new dentures in terms of speech intelligibility, regardless of the type of prosthesis, however, telescopic crown prostheses yield significantly better speech quality compared to complete dentures.
Abstract: PURPOSE A completely edentulous or partially edentulous maxilla involving missing anterior teeth may impact speech production and lead to reduced speech intelligibility The aim of this study was to prospectively evaluate the effect of a dental prosthetic rehabilitation on speech intelligibility in patients with a toothless or interrupted maxillary arch by means of an automatic, standardized speech recognition system MATERIALS AND METHODS The speech intelligibility of 45 patients with complete tooth loss or a loss including missing anterior teeth in the maxilla was evaluated by means of a polyphone-based automatic speech recognition system that assessed the percentage of correctly recognized words (word accuracy) To replace inadequate maxillary removable dentures, 20 patients from the overall sample had been rehabilitated with complete dentures and 25 patients with telescopic prostheses Speech recordings were made in four recording sessions (with and without existing prostheses and then at 1 week and 6 months after placement of newly fabricated prostheses) RESULTS Significantly higher speech intelligibility was observed in both patient groups compared to the original results without the dentures inserted After 6 months of adaptation, both groups had reached a level of speech quality that was comparable to the healthy control group However, patients receiving new telescopic prostheses showed significantly higher levels of speech intelligibility compared to those receiving new complete dentures Within 6 months, speech intelligibility did not significantly improve from the level found 1 week after insertion of new prostheses for both groups CONCLUSION Patients benefit from the fabrication of new dentures in terms of speech intelligibility, regardless of the type of prosthesis However, telescopic crown prostheses yield significantly better speech quality compared to complete dentures

13 citations


01 Sep 2012
TL;DR: An automatic system to detect sigmatism from the speech signal is proposed and integrated as part of a Java applet that allows patients to record their own speech, either by pronouncing isolated phones, a specific word or a list of words, and provides them with a feedback whether the sibilant phones are being correctly pronounced.
Abstract: We propose in this paper an automatic system to detect sigmatism from the speech signal. Sigmatism occurs when the tongue is positioned incorrectly during articulation of sibilant phones like /s/ and /z/. For our task we extracted various sets of features from speech: Mel frequency cepstral coefficients, energies in specific bandwidths of the spectral envelope, and the so-called supervectors, which are the parameters of an adapted speaker model. We then trained several classifiers on a speech database of German adults simulating three different types of sigmatism. Recognition results were calculated at a phone, word and speaker level for both the simulated database and for a database of pathological speakers. For the simulated database, we achieved recognition rates of up to 86%, 87% and 94% at a phone, word and speaker level. The best classifier was then integrated as part of a Java applet that allows patients to record their own speech, either by pronouncing isolated phones, a specific word or a list of words, and provides them with a feedback whether the sibilant phones are being correctly pronounced.

13 citations


Proceedings ArticleDOI
09 Sep 2012
TL;DR: This paper combines a large prosodic feature vector with features derived from a Gaussian Mixture Model used as Universal Background Model and an open-source toolkit for extracting acoustic features to assess the quality of L2 learner’s utterances with respect to sentence melody and rhythm.
Abstract: In earlier studies, we employed a large prosodic feature vector to assess the quality of L2 learner’s utterances with respect to sentence melody and rhythm. In this paper, we combine these features with two standard approaches in paralinguistic analysis: (1) features derived from a Gaussian Mixture Model used as Universal Background Model (GMM-UBM), and (2) openSMILE, an open-source toolkit for extracting acoustic features. We evaluate our approach with English speech from 94 non-native speakers perceptually scored by 62 native labellers. GMM-UBM or openSMILE modelling alone yields lower performance than our prosodic feature vector; however, adding information from the GMM-UBM modelling or openSMILE by late fusion improves results.

11 citations