scispace - formally typeset
Search or ask a question
Author

Tobias Bocklet

Bio: Tobias Bocklet is an academic researcher from Intel. The author has contributed to research in topics: Speaker recognition & Computer science. The author has an hindex of 19, co-authored 86 publications receiving 1286 citations. Previous affiliations of Tobias Bocklet include University of Erlangen-Nuremberg & SRI International.


Papers
More filters
Proceedings ArticleDOI
09 Sep 2012
TL;DR: The EPFL-CONF-174360 data indicate that speaker Traits and Likability are influenced by the environment and the speaker’s personality in terms of paralinguistics and personality.
Abstract: Keywords: Computational Paralinguistics ; Speaker Traits ; Personality ; Likability ; Pathology Reference EPFL-CONF-174360 Record created on 2012-01-23, modified on 2017-05-10

240 citations

Proceedings ArticleDOI
12 May 2008
TL;DR: This paper compares two approaches of automatic age and gender classification with 7 classes of Gaussian mixture models with universal background models, which are well known for the task of speaker identification/verification.
Abstract: This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.

119 citations

Journal ArticleDOI
TL;DR: This is the first software with the characteristics described above, and it is considered that it will help other researchers to contribute to the state-of-the-art in pathological speech assessment from different perspectives, e.g., from the clinical point of view for interpretation, and from the computer science point of views enabling the test of different measures and pattern recognition techniques.

75 citations

Proceedings ArticleDOI
01 Dec 2011
TL;DR: It is shown that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations.
Abstract: 70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one of the earliest indicators of PD. The issue of this study is to automatically detect whether the speech/voice of a person is affected by PD. We employ acoustic features, prosodic features and features derived from a two-mass model of the vocal folds on different kinds of speech tests: sustained phonations, syllable repetitions, read texts and monologues. Classification is performed in either case by SVMs. A correlation-based feature selection was performed, in order to identify the most important features for each of these systems. We report recognition results of 91% when trying to differentiate between normal speaking persons and speakers with PD in early stages with prosodic modeling. With acoustic modeling we achieved a recognition rate of 88% and with vocal modeling we achieved 79%. After feature selection these results could greatly be improved. But we expect those results to be too optimistic. We show that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations. The most important prosodic features were based on energy, pauses and F0. The masses and the compliances of spring were found to be the most important parameters of the two-mass vocal fold model.

73 citations

Journal ArticleDOI
TL;DR: The proposed approach may help clinicians to make more accurate and timely decisions about the evaluation and therapy associated to the dysarthria level of patients, and is a great step towards unobtrusive/ecological evaluations of patients with dysarthric speech without the need of attending medical appointments.

68 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

1,433 citations

Journal ArticleDOI
TL;DR: A basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis, is proposed and intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters.
Abstract: Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.

1,158 citations

Journal ArticleDOI
TL;DR: A thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications is provided.
Abstract: Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of machine learning has yielded far better results when compared to others in a variety of applications including speech, and thus became a very attractive area of research. This paper provides a thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications. A thorough statistical analysis is provided in this review which was conducted by extracting specific information from 174 papers published between the years 2006 and 2018. The results provided in this paper shed light on the trends of research in this area as well as bring focus to new research topics.

701 citations

Proceedings ArticleDOI
25 Aug 2013
TL;DR: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech and introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech.
Abstract: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall twelve enacted emotional states. In this paper, we describe these four Sub-Challenges, their conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. Index Terms: Computational Paralinguistics, Challenge, Social Signals, Conflict, Emotion, Autism

694 citations