Opensmile: the munich versatile and fast open-source audio feature extractor

doi:10.1145/1873951.1874246

Home
/
Papers
/
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings Article•DOI•

Opensmile: the munich versatile and fast open-source audio feature extractor

Florian Eyben¹, Martin Wöllmer¹, Björn Schuller¹•Institutions (1)

Technische Universität München¹

25 Oct 2010-pp 1459-1462

TL;DR: The openSMILE feature extraction toolkit is introduced, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities and has a modular, component based architecture which makes extensions via plug-ins easy.

read less

Abstract: We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Recent developments in openSMILE, the munich open-source multimedia feature extractor

[...]

Florian Eyben¹, Felix Weninger¹, Florian Gross¹, Björn Schuller¹•Institutions (1)

Technische Universität München¹

21 Oct 2013

TL;DR: OpenSMILE 2.0 as mentioned in this paper unifies feature extraction paradigms from speech, music, and general sound events with basic video features for multi-modal processing, allowing for time synchronization of parameters, on-line incremental processing as well as off-line and batch processing, and the extraction of statistical functionals (feature summaries).

...read moreread less

Abstract: We present recent developments in the openSMILE feature extraction toolkit. Version 2.0 now unites feature extraction paradigms from speech, music, and general sound events with basic video features for multi-modal processing. Descriptors from audio and video can be processed jointly in a single framework allowing for time synchronization of parameters, on-line incremental processing as well as off-line and batch processing, and the extraction of statistical functionals (feature summaries), such as moments, peaks, regression parameters, etc. Postprocessing of the features includes statistical classifiers such as support vector machine models or file export for popular toolkits such as Weka or HTK. Available low-level descriptors include popular speech, music and video features including Mel-frequency and similar cepstral and spectral coefficients, Chroma, CENS, auditory model based loudness, voice quality, local binary pattern, color, and optical flow histograms. Besides, voice activity detection, pitch tracking and face detection are supported. openSMILE is implemented in C++, using standard open source libraries for on-line audio and video input. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. openSMILE 2.0 is distributed under a research license and can be downloaded from http://opensmile.sourceforge.net/.

...read moreread less

1,186 citations

Cites background from "Opensmile: the munich versatile and..."

...Even though openSMILE originates from the audio processing domain as such, it has been featured in the 2010 ACM MM Open Source Software Competition [4] it has recently been extended with basic video features, and, more importantly, its design is principally modality independent....
[...]
...in the 2010 ACM MM Open Source Software Competition [4] – it has recently been extended with basic video features, and, more importantly, its design is principally modality independent....
[...]
...Details on the ring-buffer architecture can be found in [4] and on the project webpage....
[...]

Proceedings Article•DOI•

The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

[...]

Björn Schuller¹, Stefan Steidl², Anton Batliner³, Alessandro Vinciarelli, Klaus R. Scherer, Fabien Ringeval, Mohamed Chetouani⁴, Felix Weninger³, Florian Eyben, Erik Marchi, Marcello Mortillaro⁵, Hugues Salamin⁶, Anna Polychroniou⁷, Fabio Valente⁸, Samuel Kim⁹ - Show less +11 more•Institutions (9)

Augsburg College¹, MorphoSys², Technische Universität München³, University of Paris⁴, University of Geneva⁵, University of Glasgow⁶, Trinity College, Dublin⁷, Idiap Research Institute⁸, Yonsei University⁹

25 Aug 2013

TL;DR: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech and introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech.

...read moreread less

Abstract: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall twelve enacted emotional states. In this paper, we describe these four Sub-Challenges, their conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. Index Terms: Computational Paralinguistics, Challenge, Social Signals, Conflict, Emotion, Autism

...read moreread less

694 citations

Cites methods from "Opensmile: the munich versatile and..."

...Again, we use TUM’s open-source openSMILE feature extractor [27] and provide extracted feature sets on a per-chunk level (except for SVC)....
[...]

Journal Article•DOI•

Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

[...]

Björn Schuller¹, Anton Batliner², Stefan Steidl², Dino Seppi³•Institutions (3)

Technische Universität München¹, University of Erlangen-Nuremberg², Katholieke Universiteit Leuven³

01 Nov 2011-Speech Communication

TL;DR: The basic phenomenon reflecting the last fifteen years is addressed, commenting on databases, modelling and annotation, the unit of analysis and prototypicality and automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration.

...read moreread less

671 citations

Cites background or methods from "Opensmile: the munich versatile and..."

...Future studies will very likely address feature importance across databases (Eyben et al., 2010a) and further types of efficient feature selection (Rong et al, 2007; Altun and Polata, 2009)....
[...]
...In the Classifier Sub-Challenge, participants designed their own classifiers and had to use a selection of 384 standard acoustic features, computed with the open SMILE toolkit (Eyben et al., 2009, 2010c) provided by the organisers....
[...]
...The Munich open-source Emotion and Affect Recognition Toolkit (openEAR) [65] is the first of its kind to provide a free open source toolkit that integrates all three necessary components: feature extraction (by the fast openSMILE backend [66]), classifiers, and pre-trained models....
[...]
...A severe issue in cross/multi-corpora studies is the inhomogeneous labelling process, which often leads to inconsistent, incompatible or even distinct emotional classes (Eyben et al., 2010a)....
[...]
...In the Classifier Sub-Challenge, participants designed their own classifiers and had to use a selection of 384 standard acoustic features, computed with the openSMILE toolkit [65, 66] provided by the organisers....
[...]

Proceedings Article•DOI•

Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions

[...]

Fabien Ringeval¹, Andreas Sonderegger¹, Juergen Sauer¹, Denis Lalanne¹•Institutions (1)

University of Freiburg¹

22 Apr 2013

TL;DR: A new multimodal corpus of spontaneous collaborative and affective interactions in French: RECOLA is presented, which is being made available to the research community to take self-report measures of users during task completion.

...read moreread less

Abstract: We present in this paper a new multimodal corpus of spontaneous collaborative and affective interactions in French: RECOLA, which is being made available to the research community. Participants were recorded in dyads during a video conference while completing a task requiring collaboration. Different multimodal data, i.e., audio, video, ECG and EDA, were recorded continuously and synchronously. In total, 46 participants took part in the test, for which the first 5 minutes of interaction were kept to ease annotation. In addition to these recordings, 6 annotators measured emotion continuously on two dimensions: arousal and valence, as well as social behavior labels on live dimensions. The corpus allowed us to take self-report measures of users during task completion. Methodologies and issues related to affective corpus construction are briefly reviewed in this paper. We further detail how the corpus was constructed, i.e., participants, procedure and task, the multimodal recording setup, the annotation of data and some analysis of the quality of these annotations.

...read moreread less

630 citations

Proceedings Article•DOI•

Context-Dependent Sentiment Analysis in User-Generated Videos.

[...]

Soujanya Poria¹, Erik Cambria¹, Devamanyu Hazarika², Navonil Majumder³, Amir Zadeh⁴, Louis-Philippe Morency⁴ - Show less +2 more•Institutions (4)

Nanyang Technological University¹, National University of Singapore², Instituto Politécnico Nacional³, Carnegie Mellon University⁴

01 Jul 2017

TL;DR: A LSTM-based model is proposed that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process and showing 5-10% performance improvement over the state of the art and high robustness to generalizability.

...read moreread less

Abstract: Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10% performance improvement over the state of the art and high robustness to generalizability.

...read moreread less

570 citations

Cites background or methods from "Opensmile: the munich versatile and..."

...(Metallinou et al., 2008) and (Eyben et al., 2010a) fused audio and textual modalities for emotion recognition....
[...]
...To compute the features, we use openSMILE (Eyben et al., 2010b), an open-source software that automatically extracts audio features such as pitch and voice intensity....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Book•

Data Mining

[...]

Ian Witten

01 Jan 2008

TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.

...read moreread less

Abstract: tic regression, and it concerns studying the effect of covariates on the risk of disease. The chapter includes generalized estimating equations (GEE’s) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC. As a prelude to the following chapter on repeated-measures data, Chapter 5 presents time series analysis. The material on repeated-measures analysis uses linear additive models with GEE’s and PROC MIXED in SAS for linear mixed-effects models. Chapter 7 is about survival data analysis. All computing throughout the book is done using SAS procedures.

...read moreread less

9,995 citations

Praat, a system for doing phonetics by computer

[...]

Paul Boersma

01 Jan 2002

8,837 citations

Praat : doing phonetics by computer

[...]

Paul Boersma

01 Jan 2006

5,265 citations

Praat : doing phonetics by computer (version 4.4.24)

[...]

Paul Boersma

01 Jan 2006

1,009 citations

"Opensmile: the munich versatile and..." refers methods in this paper

...Related feature extraction tools used for speech research include e.g. the Hidden Markov Model Toolkit (HTK ) [15], the PRAAT Software [3], the Speech Filing System3 (SFS), the Auditory Toolbox4, a MatlabTM toolbox5 by Raul Fernandez [6], the Tracter framework [7], and the SNACK 6 package for the Tcl scripting language....
[...]
...the Hidden Markov Model Toolkit (HTK ) [15], the PRAAT Software [3], the Speech Filing System(3) (SFS), the Auditory Toolbox(4), a Matlab toolbox(5) by Raul Fernandez [6], the Tracter framework [7], and the SNACK 6 package for the Tcl scripting language....
[...]