scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Opensmile: the munich versatile and fast open-source audio feature extractor

25 Oct 2010-pp 1459-1462
TL;DR: The openSMILE feature extraction toolkit is introduced, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities and has a modular, component based architecture which makes extensions via plug-ins easy.
Abstract: We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.
Citations
More filters
Proceedings ArticleDOI
21 Oct 2013
TL;DR: OpenSMILE 2.0 as mentioned in this paper unifies feature extraction paradigms from speech, music, and general sound events with basic video features for multi-modal processing, allowing for time synchronization of parameters, on-line incremental processing as well as off-line and batch processing, and the extraction of statistical functionals (feature summaries).
Abstract: We present recent developments in the openSMILE feature extraction toolkit. Version 2.0 now unites feature extraction paradigms from speech, music, and general sound events with basic video features for multi-modal processing. Descriptors from audio and video can be processed jointly in a single framework allowing for time synchronization of parameters, on-line incremental processing as well as off-line and batch processing, and the extraction of statistical functionals (feature summaries), such as moments, peaks, regression parameters, etc. Postprocessing of the features includes statistical classifiers such as support vector machine models or file export for popular toolkits such as Weka or HTK. Available low-level descriptors include popular speech, music and video features including Mel-frequency and similar cepstral and spectral coefficients, Chroma, CENS, auditory model based loudness, voice quality, local binary pattern, color, and optical flow histograms. Besides, voice activity detection, pitch tracking and face detection are supported. openSMILE is implemented in C++, using standard open source libraries for on-line audio and video input. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. openSMILE 2.0 is distributed under a research license and can be downloaded from http://opensmile.sourceforge.net/.

1,186 citations


Cites background from "Opensmile: the munich versatile and..."

  • ...Even though openSMILE originates from the audio processing domain as such, it has been featured in the 2010 ACM MM Open Source Software Competition [4] it has recently been extended with basic video features, and, more importantly, its design is principally modality indepen­dent....

    [...]

  • ...in the 2010 ACM MM Open Source Software Competition [4] – it has recently been extended with basic video features, and, more importantly, its design is principally modality independent....

    [...]

  • ...Details on the ring-buffer architecture can be found in [4] and on the project webpage....

    [...]

Proceedings ArticleDOI
25 Aug 2013
TL;DR: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech and introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech.
Abstract: The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall twelve enacted emotional states. In this paper, we describe these four Sub-Challenges, their conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. Index Terms: Computational Paralinguistics, Challenge, Social Signals, Conflict, Emotion, Autism

694 citations


Cites methods from "Opensmile: the munich versatile and..."

  • ...Again, we use TUM’s open-source openSMILE feature extractor [27] and provide extracted feature sets on a per-chunk level (except for SVC)....

    [...]

Journal ArticleDOI
TL;DR: The basic phenomenon reflecting the last fifteen years is addressed, commenting on databases, modelling and annotation, the unit of analysis and prototypicality and automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration.

671 citations


Cites background or methods from "Opensmile: the munich versatile and..."

  • ...Future studies will very likely address feature importance across databases (Eyben et al., 2010a) and further types of efficient feature selection (Rong et al, 2007; Altun and Polata, 2009)....

    [...]

  • ...In the Classifier Sub-Challenge, participants designed their own classifiers and had to use a selection of 384 standard acoustic features, computed with the open SMILE toolkit (Eyben et al., 2009, 2010c) provided by the organisers....

    [...]

  • ...The Munich open-source Emotion and Affect Recognition Toolkit (openEAR) [65] is the first of its kind to provide a free open source toolkit that integrates all three necessary components: feature extraction (by the fast openSMILE backend [66]), classifiers, and pre-trained models....

    [...]

  • ...A severe issue in cross/multi-corpora studies is the inhomogeneous labelling process, which often leads to inconsistent, incompatible or even distinct emotional classes (Eyben et al., 2010a)....

    [...]

  • ...In the Classifier Sub-Challenge, participants designed their own classifiers and had to use a selection of 384 standard acoustic features, computed with the openSMILE toolkit [65, 66] provided by the organisers....

    [...]

Proceedings ArticleDOI
22 Apr 2013
TL;DR: A new multimodal corpus of spontaneous collaborative and affective interactions in French: RECOLA is presented, which is being made available to the research community to take self-report measures of users during task completion.
Abstract: We present in this paper a new multimodal corpus of spontaneous collaborative and affective interactions in French: RECOLA, which is being made available to the research community. Participants were recorded in dyads during a video conference while completing a task requiring collaboration. Different multimodal data, i.e., audio, video, ECG and EDA, were recorded continuously and synchronously. In total, 46 participants took part in the test, for which the first 5 minutes of interaction were kept to ease annotation. In addition to these recordings, 6 annotators measured emotion continuously on two dimensions: arousal and valence, as well as social behavior labels on live dimensions. The corpus allowed us to take self-report measures of users during task completion. Methodologies and issues related to affective corpus construction are briefly reviewed in this paper. We further detail how the corpus was constructed, i.e., participants, procedure and task, the multimodal recording setup, the annotation of data and some analysis of the quality of these annotations.

630 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A LSTM-based model is proposed that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process and showing 5-10% performance improvement over the state of the art and high robustness to generalizability.
Abstract: Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10% performance improvement over the state of the art and high robustness to generalizability.

570 citations


Cites background or methods from "Opensmile: the munich versatile and..."

  • ...(Metallinou et al., 2008) and (Eyben et al., 2010a) fused audio and textual modalities for emotion recognition....

    [...]

  • ...To compute the features, we use openSMILE (Eyben et al., 2010b), an open-source software that automatically extracts audio features such as pitch and voice intensity....

    [...]

References
More filters
Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations

Book
01 Jan 2008
TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.
Abstract: tic regression, and it concerns studying the effect of covariates on the risk of disease. The chapter includes generalized estimating equations (GEE’s) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC. As a prelude to the following chapter on repeated-measures data, Chapter 5 presents time series analysis. The material on repeated-measures analysis uses linear additive models with GEE’s and PROC MIXED in SAS for linear mixed-effects models. Chapter 7 is about survival data analysis. All computing throughout the book is done using SAS procedures.

9,995 citations

01 Jan 2006

5,265 citations

01 Jan 2006

1,009 citations


"Opensmile: the munich versatile and..." refers methods in this paper

  • ...Related feature extraction tools used for speech research include e.g. the Hidden Markov Model Toolkit (HTK ) [15], the PRAAT Software [3], the Speech Filing System3 (SFS), the Auditory Toolbox4, a MatlabTM toolbox5 by Raul Fernandez [6], the Tracter framework [7], and the SNACK 6 package for the Tcl scripting language....

    [...]

  • ...the Hidden Markov Model Toolkit (HTK ) [15], the PRAAT Software [3], the Speech Filing System(3) (SFS), the Auditory Toolbox(4), a Matlab toolbox(5) by Raul Fernandez [6], the Tracter framework [7], and the SNACK 6 package for the Tcl scripting language....

    [...]