Home
/
Authors
/
Tobias Bocklet

Author

Tobias Bocklet

Other affiliations: University of Erlangen-Nuremberg, SRI International

Bio: Tobias Bocklet is an academic researcher from Intel. The author has contributed to research in topics: Speaker recognition & Computer science. The author has an hindex of 19, co-authored 86 publications receiving 1286 citations. Previous affiliations of Tobias Bocklet include University of Erlangen-Nuremberg & SRI International.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The INTERSPEECH 2012 Speaker Trait Challenge

[...]

Björn Schuller¹, Stefan Steidl², Anton Batliner³, Elmar Nöth³, Alessandro Vinciarelli⁴, Felix Burkhardt⁵, Rob J.J.H. van Son⁶, Felix Weninger¹, Florian Eyben¹, Tobias Bocklet³, Gelareh Mohammadi⁷, Benjamin Weiss⁸ - Show less +8 more•Institutions (8)

Technische Universität München¹, MorphoSys², University of Erlangen-Nuremberg³, University of Glasgow⁴, Deutsche Telekom⁵, Netherlands Cancer Institute⁶, École Polytechnique Fédérale de Lausanne⁷, Technical University of Berlin⁸

09 Sep 2012

TL;DR: The EPFL-CONF-174360 data indicate that speaker Traits and Likability are influenced by the environment and the speaker’s personality in terms of paralinguistics and personality.

...read moreread less

Abstract: Keywords: Computational Paralinguistics ; Speaker Traits ; Personality ; Likability ; Pathology Reference EPFL-CONF-174360 Record created on 2012-01-23, modified on 2017-05-10

...read moreread less

240 citations

Proceedings Article•DOI•

Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

[...]

Tobias Bocklet, Andreas Maier, Josef Bauer¹, Felix Burkhardt², Elmar Nöth³ - Show less +1 more•Institutions (3)

Siemens¹, T-Systems², University of Erlangen-Nuremberg³

12 May 2008

TL;DR: This paper compares two approaches of automatic age and gender classification with 7 classes of Gaussian mixture models with universal background models, which are well known for the task of speaker identification/verification.

...read moreread less

Abstract: This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.

...read moreread less

119 citations

Journal Article•DOI•

NeuroSpeech: An open-source software for Parkinson's speech analysis

[...]

Juan Rafael Orozco-Arroyave¹, Juan Camilo Vásquez-Correa¹, Jesús Francisco Vargas-Bonilla¹, Raman Arora², Najim Dehak², Phani Sankar Nidadavolu², Heidi Christensen³, Frank Rudzicz⁴, Maria Yancheva⁴, Hamidreza Chinaei⁴, Alyssa Vann⁵, Nikolai Vogler⁶, Tobias Bocklet⁷, Milos Cernak⁸, Julius Hannink, Elmar Nöth - Show less +12 more•Institutions (8)

University of Antioquia¹, Johns Hopkins University², University of Sheffield³, University of Toronto⁴, Stanford University⁵, University of California, Irvine⁶, Intel⁷, Idiap Research Institute⁸

17 Jul 2017-Digital Signal Processing

TL;DR: This is the first software with the characteristics described above, and it is considered that it will help other researchers to contribute to the state-of-the-art in pathological speech assessment from different perspectives, e.g., from the clinical point of view for interpretation, and from the computer science point of views enabling the test of different measures and pattern recognition techniques.

...read moreread less

75 citations

Proceedings Article•DOI•

Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis

[...]

Tobias Bocklet, Elmar Nöth¹, Georg Stemmer¹, Hana Ruzickova², Jan Rusz² - Show less +1 more•Institutions (2)

University of Erlangen-Nuremberg¹, Charles University in Prague²

01 Dec 2011

TL;DR: It is shown that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations.

...read moreread less

Abstract: 70% to 90% of patients with Parkinson's disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one of the earliest indicators of PD. The issue of this study is to automatically detect whether the speech/voice of a person is affected by PD. We employ acoustic features, prosodic features and features derived from a two-mass model of the vocal folds on different kinds of speech tests: sustained phonations, syllable repetitions, read texts and monologues. Classification is performed in either case by SVMs. A correlation-based feature selection was performed, in order to identify the most important features for each of these systems. We report recognition results of 91% when trying to differentiate between normal speaking persons and speakers with PD in early stages with prosodic modeling. With acoustic modeling we achieved a recognition rate of 88% and with vocal modeling we achieved 79%. After feature selection these results could greatly be improved. But we expect those results to be too optimistic. We show that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations. The most important prosodic features were based on energy, pauses and F0. The masses and the compliances of spring were found to be the most important parameters of the two-mass vocal fold model.

...read moreread less

73 citations

Journal Article•DOI•

Towards an automatic evaluation of the dysarthria level of patients with Parkinson's disease.

[...]

Juan Camilo Vásquez-Correa¹, Juan Camilo Vásquez-Correa², Juan Rafael Orozco-Arroyave², Juan Rafael Orozco-Arroyave¹, Tobias Bocklet³, Elmar Nöth¹ - Show less +2 more•Institutions (3)

University of Erlangen-Nuremberg¹, University of Antioquia², Intel³

01 Nov 2018-Journal of Communication Disorders

TL;DR: The proposed approach may help clinicians to make more accurate and timely decisions about the evaluation and therapy associated to the dysarthria level of patients, and is a great step towards unobtrusive/ecological evaluations of patients with dysarthric speech without the need of attending medical appointments.

...read moreread less

68 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Journal Article•DOI•

An overview of text-independent speaker recognition: From features to supervectors

[...]

Tomi Kinnunen¹, Haizhou Li²•Institutions (2)

University of Eastern Finland¹, Institute for Infocomm Research Singapore²

01 Jan 2010-Speech Communication

TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

...read moreread less

1,433 citations

Journal Article•DOI•

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing

[...]

Florian Eyben¹, Klaus R. Scherer², Björn Schuller, Johan Sundberg³, Elisabeth André⁴, Carlos Busso⁵, Laurence Devillers⁶, Julien Epps⁷, Petri Laukka⁸, Shrikanth S. Narayanan⁹, Khiet P. Truong¹⁰ - Show less +7 more•Institutions (10)

Technische Universität München¹, University of Geneva², Royal Institute of Technology³, Augsburg College⁴, University of Texas at Dallas⁵, Paris-Sorbonne University⁶, University of New South Wales⁷, Stockholm University⁸, University of Southern California⁹, University of Twente¹⁰

01 Apr 2016-IEEE Transactions on Affective Computing

TL;DR: A basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis, is proposed and intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters.

...read moreread less

Abstract: Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.

...read moreread less

1,158 citations

Journal Article•DOI•

Speech Recognition Using Deep Neural Networks: A Systematic Review

[...]

Ali Bou Nassif¹, Ismail Shahin¹, Imtinan Basem Attili¹, Mohammad Azzeh², Khaled Shaalan³ - Show less +1 more•Institutions (3)

University of Sharjah¹, Applied Science Private University², British University in Dubai³

01 Feb 2019-IEEE Access

TL;DR: A thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications is provided.

...read moreread less

Abstract: Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of machine learning has yielded far better results when compared to others in a variety of applications including speech, and thus became a very attractive area of research. This paper provides a thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications. A thorough statistical analysis is provided in this review which was conducted by extracting specific information from 174 papers published between the years 2006 and 2018. The results provided in this paper shed light on the trends of research in this area as well as bring focus to new research topics.

...read moreread less

701 citations

Proceedings Article•DOI•

The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

[...]

Björn Schuller¹, Stefan Steidl², Anton Batliner³, Alessandro Vinciarelli, Klaus R. Scherer, Fabien Ringeval, Mohamed Chetouani⁴, Felix Weninger³, Florian Eyben, Erik Marchi, Marcello Mortillaro⁵, Hugues Salamin⁶, Anna Polychroniou⁷, Fabio Valente⁸, Samuel Kim⁹ - Show less +11 more•Institutions (9)

Augsburg College¹, MorphoSys², Technische Universität München³, University of Paris⁴, University of Geneva⁵, University of Glasgow⁶, Trinity College, Dublin⁷, Idiap Research Institute⁸, Yonsei University⁹

25 Aug 2013

TL;DR: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech and introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech.

...read moreread less

Abstract: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall twelve enacted emotional states. In this paper, we describe these four Sub-Challenges, their conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. Index Terms: Computational Paralinguistics, Challenge, Social Signals, Conflict, Emotion, Autism

...read moreread less

694 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse