scispace - formally typeset
Search or ask a question
Author

Jeroen Lichtenauer

Bio: Jeroen Lichtenauer is an academic researcher from Imperial College London. The author has contributed to research in topics: Timestamp & Gesture. The author has an hindex of 4, co-authored 7 publications receiving 944 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported.
Abstract: MAHNOB-HCI is a multimodal database recorded in response to affective stimuli with the goal of emotion recognition and implicit tagging research. A multimodal setup was arranged for synchronized recording of face videos, audio signals, eye gaze data, and peripheral/central nervous system physiological signals. Twenty-seven participants from both genders and different cultural backgrounds participated in two experiments. In the first experiment, they watched 20 emotional videos and self-reported their felt emotions using arousal, valence, dominance, and predictability as well as emotional keywords. In the second experiment, short videos and images were shown once without any tag and then with correct or incorrect tags. Agreement or disagreement with the displayed tags was assessed by the participants. The recorded videos and bodily responses were segmented and stored in a database. The database is made available to the academic community via a web-based system. The collected data were analyzed and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported. These results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol.

1,162 citations

Book ChapterDOI
09 Oct 2011
TL;DR: In this article, the authors introduce a multi-modal database for the analysis of human interaction, in particular mimicry, and elaborate on the theoretical hypotheses of the relationship between the occurrence of mimicry and human affect.
Abstract: In this paper we introduce a multi-modal database for the analysis of human interaction, in particular mimicry, and elaborate on the theoretical hypotheses of the relationship between the occurrence of mimicry and human affect. The recorded experiments are designed to explore this relationship. The corpus is recorded with 18 synchronised audio and video sensors, and is annotated for many different phenomena, including dialogue acts, turn-taking, affect, head gestures, hand gestures, body movement and facial expression. Recordings were made of two experiments: a discussion on a political topic, and a role-playing game. 40 participants were recruited, all of whom selfreported their felt experiences. The corpus will be made available to the scientific community.

43 citations

Journal ArticleDOI
TL;DR: This work centralises the synchronisation task by recording all trigger- or timestamp signals with a multi-channel audio interface, and shows that a consumer PC can currently capture 8-bit video data with 1024x1024 spatial- and 59.1Hz temporal resolution, from at least 14 cameras, together with 8 channels of 24-bit audio at 96kHz.

32 citations

Proceedings ArticleDOI
02 Sep 2009
TL;DR: This work achieves accurate synchronization between audio, video and additional sensors, by recording audio together with sensor trigger- or timestamp signals, using a multi-channel audio input, allowing maximal flexibility with minimal cost.
Abstract: Applications such as surveillance and human motion capture require high-bandwidth recording from multiple cameras. Furthermore, the recent increase in research on sensor fusion has raised the demand on synchronization accuracy between video, audio and other sensor modalities. Previously, capturing synchronized, high resolution video from multiple cameras required complex, inflexible and expensive solutions. Our experiments show that a single PC, built from contemporary low-cost computer hardware, could currently handle up to 470MB/s of input data. This allows capturing from 18 cameras of 780x580pixels at 60fps each, or 36 cameras at 30fps. Furthermore, we achieve accurate synchronization between audio, video and additional sensors, by recording audio together with sensor trigger- or timestamp signals, using a multi-channel audio input. In this way, each sensor modality can be captured with separate software and hardware, allowing maximal flexibility with minimal cost.

26 citations

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This work presents a framework for efficient, omnidirectional head-pose initialisation and tracking in the presence of missing and false positive marker detections, which enables easy, accurate and synchronous head-motion capture as ground truth with or input for other machine vision algorithms.
Abstract: Conventional marker-based optical motion capture methods rely on scene attenuation (e.g. by infrared-pass filtering). This renders the images useless for development and testing of machine vision methods under natural conditions. Unfortunately, combining, calibrating and synchronising a system for motion capture with a separate camera is a costly and cumbersome task. To overcome this problem, we present a framework for efficient, omnidirectional head-pose initialisation and tracking in the presence of missing and false positive marker detections. As such, it finally enables easy, accurate and synchronous head-motion capture as ground truth with or input for other machine vision algorithms.

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
Abstract: We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.

3,013 citations

Journal ArticleDOI
TL;DR: Results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported.
Abstract: MAHNOB-HCI is a multimodal database recorded in response to affective stimuli with the goal of emotion recognition and implicit tagging research. A multimodal setup was arranged for synchronized recording of face videos, audio signals, eye gaze data, and peripheral/central nervous system physiological signals. Twenty-seven participants from both genders and different cultural backgrounds participated in two experiments. In the first experiment, they watched 20 emotional videos and self-reported their felt emotions using arousal, valence, dominance, and predictability as well as emotional keywords. In the second experiment, short videos and images were shown once without any tag and then with correct or incorrect tags. Agreement or disagreement with the displayed tags was assessed by the participants. The recorded videos and bodily responses were segmented and stored in a database. The database is made available to the academic community via a web-based system. The collected data were analyzed and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported. These results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol.

1,162 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a review of 154 studies that apply deep learning to EEG, published between 2010 and 2018, and spanning different application domains such as epilepsy, sleep, brain-computer interfacing, and cognitive and affective monitoring.
Abstract: Context Electroencephalography (EEG) is a complex signal and can require several years of training, as well as advanced signal processing and feature extraction methodologies to be correctly interpreted. Recently, deep learning (DL) has shown great promise in helping make sense of EEG signals due to its capacity to learn good feature representations from raw data. Whether DL truly presents advantages as compared to more traditional EEG processing approaches, however, remains an open question. Objective In this work, we review 154 papers that apply DL to EEG, published between January 2010 and July 2018, and spanning different application domains such as epilepsy, sleep, brain-computer interfacing, and cognitive and affective monitoring. We extract trends and highlight interesting approaches from this large body of literature in order to inform future research and formulate recommendations. Methods Major databases spanning the fields of science and engineering were queried to identify relevant studies published in scientific journals, conferences, and electronic preprint repositories. Various data items were extracted for each study pertaining to (1) the data, (2) the preprocessing methodology, (3) the DL design choices, (4) the results, and (5) the reproducibility of the experiments. These items were then analyzed one by one to uncover trends. Results Our analysis reveals that the amount of EEG data used across studies varies from less than ten minutes to thousands of hours, while the number of samples seen during training by a network varies from a few dozens to several millions, depending on how epochs are extracted. Interestingly, we saw that more than half the studies used publicly available data and that there has also been a clear shift from intra-subject to inter-subject approaches over the last few years. About [Formula: see text] of the studies used convolutional neural networks (CNNs), while [Formula: see text] used recurrent neural networks (RNNs), most often with a total of 3-10 layers. Moreover, almost one-half of the studies trained their models on raw or preprocessed EEG time series. Finally, the median gain in accuracy of DL approaches over traditional baselines was [Formula: see text] across all relevant studies. More importantly, however, we noticed studies often suffer from poor reproducibility: a majority of papers would be hard or impossible to reproduce given the unavailability of their data and code. Significance To help the community progress and share work more effectively, we provide a list of recommendations for future studies and emphasize the need for more reproducible research. We also make our summary table of DL and EEG papers available and invite authors of published work to contribute to it directly. A planned follow-up to this work will be an online public benchmarking portal listing reproducible results.

699 citations

Journal ArticleDOI
TL;DR: To meet the need for publicly available corpora of well-labeled video, the Denver intensity of spontaneous facial action database is collected, ground-truthed, and prepared for distribution.
Abstract: Access to well-labeled recordings of facial expression is critical to progress in automated facial expression recognition. With few exceptions, publicly available databases are limited to posed facial behavior that can differ markedly in conformation, intensity, and timing from what occurs spontaneously. To meet the need for publicly available corpora of well-labeled video, we collected, ground-truthed, and prepared for distribution the Denver intensity of spontaneous facial action database. Twenty-seven young adults were video recorded by a stereo camera while they viewed video clips intended to elicit spontaneous emotion expression. Each video frame was manually coded for presence, absence, and intensity of facial action units according to the facial action unit coding system. Action units are the smallest visibly discriminable changes in facial action; they may occur individually and in combinations to comprise more molar facial expressions. To provide a baseline for use in future research, protocols and benchmarks for automated action unit intensity measurement are reported. Details are given for accessing the database for research in computer vision, machine learning, and affective and behavioral science.

650 citations

Journal ArticleDOI
TL;DR: A survey of the neurophysiological research performed from 2009 to 2016 is presented, providing a comprehensive overview of the existing works in emotion recognition using EEG signals, and a set of good practice recommendations that researchers must follow to achieve reproducible, replicable, well-validated and high-quality results.
Abstract: Emotions have an important role in daily life, not only in human interaction, but also in decision-making processes, and in the perception of the world around us. Due to the recent interest shown by the research community in establishing emotional interactions between humans and computers, the identification of the emotional state of the former became a need. This can be achieved through multiple measures, such as subjective self-reports, autonomic and neurophysiological measurements. In the last years, Electroencephalography (EEG) received considerable attention from researchers, since it can provide a simple, cheap, portable, and ease-to-use solution for identifying emotions. In this paper, we present a survey of the neurophysiological research performed from 2009 to 2016, providing a comprehensive overview of the existing works in emotion recognition using EEG signals. We focus our analysis in the main aspects involved in the recognition process (e.g., subjects, features extracted, classifiers), and compare the works per them. From this analysis, we propose a set of good practice recommendations that researchers must follow to achieve reproducible, replicable, well-validated and high-quality results. We intend this survey to be useful for the research community working on emotion recognition through EEG signals, and in particular for those entering this field of research, since it offers a structured starting point.

640 citations