scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A database of German emotional speech.

04 Sep 2005-pp 1517-1520
TL;DR: A database of emotional speech that was evaluated in a perception test regarding the recognisability of emotions and their naturalness and can be accessed by the public via the internet.
Abstract: The article describes a database of emotional speech. Ten actors (5 female and 5 male) simulated the emotions, producing 10 German utterances (5 short and 5 longer sentences) which could be used in everyday communication and are interpretable in all applied emotions. The recordings were taken in an anechoic chamber with high-quality recording equipment. In addition to the sound electro-glottograms were recorded. The speech material comprises about 800 sentences (seven emotions * ten actors * ten sentences + some second versions). The complete database was evaluated in a perception test regarding the recognisability of emotions and their naturalness. Utterances recognised better than 80% and judged as natural by more than 60% of the listeners were phonetically labelled in a narrow transcription with special markers for voice-quality, phonatory and articulatory settings and articulatory features. The database can be accessed by the public via the internet (http://www.expressive-speech.net/emodb/).

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system, the choice of suitable features for speech representation, and the proper preparation of an emotional speech database for evaluating system performance are addressed.

1,735 citations


Cites background or methods from "A database of German emotional spee..."

  • ...database [18] Public and free German 800 utterances (10 actors...

    [...]

  • ...: Some corpus developers prefer that the number of utterances for each emotion is almost the same in order to properly evaluate the classification accuracy such as in the Berlin corpus [18]....

    [...]

  • ...The GMVAR model was applied to the Berlin emotional speech database [18] which contained the anger, fear, happiness, boredom, sadness, disgust, and neutral emotions....

    [...]

  • ...the human recognition accuracy was 67% for DED [38], 80% for Berlin [18], and 65% in [94]....

    [...]

  • ...In many databases, it is difficult even for human subjects to determine the emotion of some recorded utterances; e.g. the human recognition accuracy was 67% for DED [38], 80% for Berlin [18], and 65% in [94]....

    [...]

Journal ArticleDOI
TL;DR: A basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis, is proposed and intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters.
Abstract: Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.

1,158 citations


Additional excerpts

  • ...It was introduced by [48]....

    [...]

Journal ArticleDOI
16 May 2018-PLOS ONE
TL;DR: The RAVDESS is a validated multimodal database of emotional speech and song consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent, which shows high levels of emotional validity and test-retest intrarater reliability.
Abstract: The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976.

1,036 citations

Journal ArticleDOI
TL;DR: This first of its kind, comprehensive literature review of the diverse field of affective computing focuses mainly on the use of audio, visual and text information for multimodal affect analysis, and outlines existing methods for fusing information from different modalities.

969 citations

Journal ArticleDOI
TL;DR: This survey defines what is the task of hierarchical classification and discusses why some related tasks should not be considered hierarchical classification, and presents a new perspective about some existing hierarchical classification approaches and proposes a new unifying framework to classify the existing approaches.
Abstract: In this survey we discuss the task of hierarchical classification. The literature about this field is scattered across very different application domains and for that reason research in one domain is often done unaware of methods developed in other domains. We define what is the task of hierarchical classification and discuss why some related tasks should not be considered hierarchical classification. We also present a new perspective about some existing hierarchical classification approaches, and based on that perspective we propose a new unifying framework to classify the existing approaches. We also present a review of empirical comparisons of the existing methods reported in the literature as well as a conceptual comparison of those methods at a high level of abstraction, discussing their advantages and disadvantages.

933 citations


Cites methods from "A database of German emotional spee..."

  • ...13 The hierarchy used for mood classification based on speech in the Berlin Dataset (Burkhardt et al. 2005) used by Xiao et al. (2007)...

    [...]

  • ...The database used in this paper is Berlin emotional speech database (Burkhardt et al. 2005)....

    [...]

  • ...13 The hierarchy used for mood classification based on speech in the Berlin Dataset (Burkhardt et al. 2005) used by Xiao et al....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Findings on decoding replicate earlier findings on the ability of judges to infer vocally expressed emotions with much-better-than-chance accuracy, including consistently found differences in the recognizability of different emotions.
Abstract: Professional actors' portrayals of 14 emotions varying in intensity and valence were presented to judges. The results on decoding replicate earlier findings on the ability of judges to infer vocally expressed emotions with much-better-than-chance accuracy, including consistently found differences in the recognizability of different emotions. A total of 224 portrayals were subjected to digital acoustic analysis to obtain profiles of vocal parameters for different emotions. The data suggest that vocal parameters not only index the degree of intensity typical for different emotions but also differentiate valence or quality aspects. The data are also used to test theoretical predictions on vocal patterning based on the component process model of emotion (K.R. Scherer, 1986). Although most hypotheses are supported, some need to be revised on the basis of the empirical evidence. Discriminant analysis and jackknifing show remarkably high hit rates and patterns of confusion that closely mirror those found for listener-judges.

1,862 citations

Book
01 Jan 1980
TL;DR: In this paper, basic analytic concepts are defined for Supralaryngeal and Tension settings, as well as a notation for phonetic settings. But they do not specify the corresponding tense settings.
Abstract: Introduction 1. Basic analytic concepts 2. Supralaryngeal settings 3. Phonatory settings 4. Tension settings 5. Labels and notation for phonetic settings References Index.

1,000 citations


"A database of German emotional spee..." refers background in this paper

  • ...Emotional characteristics of voice and manner of speaking were labelled with additional characterisations, namely annotations of articulatory settings like harsh voice or whispery voice [11, 12]....

    [...]

01 Sep 2000
TL;DR: The FEELTRACE system as mentioned in this paper provides a continuous record of the perceived ebb and flow of emotion by extracting audio and visual features from people discussing emotive subjects either with each other, or with one of the research team.
Abstract: Research on the expression of emotion is underpinned by databases. Reviewing available resources persuaded us of the need to develop one that prioritised ecological validity. The basic unit of the database is a clip, which is an audiovisual recording of an episode that appears to be reasonably selfcontained. Clips range from 10 – 60 secs, and are captured as MPEG files. They were drawn from two main sources. People were recorded discussing emotive subjects either with each other, or with one of the research team. We also recorded extracts from television programs where members of the public interact in a way that at least appears essentially spontaneous. Associated with each clip are two additional types of file. An audio file (.wav format) contains speech alone, edited to remove sounds other than the main speaker. An interpretation file describes the emotional state that observers attribute to the main speaker, using the FEELTRACE system to provide a continuous record of the perceived ebb and flow of emotion. Clips have been extracted for 100 speakers, with at least two for each speaker (one relatively neutral and others showing marked emotions of different kinds).

179 citations


"A database of German emotional spee..." refers background in this paper

  • ...Although having been studied since the 1950’s, the investigation of emotional cues in speech is gaining growing attention....

    [...]

01 Jan 2000
TL;DR: In this paper, the authors explored the perceptual relevance of acoustical correlates of emotional speech by means of speech synthesis and developed emotion-rules which enable an optimized speech synthesis system to generate emotional speech.
Abstract: This paper explores the perceptual relevance of acoustical correlates of emotional speech by means of speech synthesis. Besides, the research aims at the development of emotion-rules which enable an optimized speech synthesis system to generate emotional speech. Two invesigations using this synthetizer are described : 1) the systematic variation of selected acoustical features to gain a preliminary impression regarding the importance of certain acoustical features for emotional expression, and 2) the specific manipulation of a stimulus spoken under emotionally neutral condition to investigate further the effect of certain features and the overall ability of the synthetizer to generate recognizable emotional expression. It is shown that this approach is indeed capable of generating emotional speech that is recognized almost as well as utterances realized by actors

166 citations

Book
01 Jan 1991

47 citations


"A database of German emotional spee..." refers background in this paper

  • ...Emotional characteristics of voice and manner of speaking were labelled with additional characterisations, namely annotations of articulatory settings like harsh voice or whispery voice [11, 12]....

    [...]