scispace - formally typeset
Search or ask a question
Author

Andreas Bulling

Bio: Andreas Bulling is an academic researcher from University of Stuttgart. The author has contributed to research in topics: Eye tracking & Gaze. The author has an hindex of 54, co-authored 230 publications receiving 11385 citations. Previous affiliations of Andreas Bulling include Association for Computing Machinery & German Research Centre for Artificial Intelligence.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition using on-body inertial sensors and describe the concept of an Activity Recognition Chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems.
Abstract: The last 20 years have seen ever-increasing research activity in the field of human activity recognition. With activity recognition having considerably matured, so has the number of challenges in designing, implementing, and evaluating activity recognition systems. This tutorial aims to provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition. It specifically focuses on activity recognition using on-body inertial sensors. We first discuss the key research challenges that human activity recognition shares with general pattern recognition and identify those challenges that are specific to human activity recognition. We then describe the concept of an Activity Recognition Chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems. We detail each component of the framework, provide references to related research, and introduce the best practice methods developed by the activity recognition research community. We conclude with the educational example problem of recognizing different hand gestures from inertial sensors attached to the upper and lower arm. We illustrate how each component of this framework can be implemented for this specific activity recognition problem and demonstrate how different implementations compare and how they impact overall recognition performance.

1,214 citations

01 Jan 2014
TL;DR: This tutorial aims to provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition using on-body inertial sensors and describes the concept of an Activity Recognition Chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems.
Abstract: The last 20 years have seen ever-increasing research activity in the field of human activity recognition. With activity recognition having considerably matured, so has the number of challenges in designing, implementing, and evaluating activity recognition systems. This tutorial aims to provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition. It specifically focuses on activity recognition using on-body inertial sensors. We first discuss the key research challenges that human activity recognition shares with general pattern recognition and identify those challenges that are specific to human activity recognition. We then describe the concept of an Activity Recognition Chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems. We detail each component of the framework, provide references to related research, and introduce the best practice methods developed by the activity recognition research community. We conclude with the educational example problem of recognizing different hand gestures from inertial sensors attached to the upper and lower arm. We illustrate how each component of this framework can be implemented for this specific activity recognition problem and demonstrate how different implementations compare and how they impact overall recognition performance.

1,078 citations

Proceedings ArticleDOI
13 Sep 2014
TL;DR: Pupil is an accessible, affordable, and extensible open source platform for pervasive eye tracking and gaze-based interaction and includes state-of-the-art algorithms for real-time pupil detection and tracking, calibration, and accurate gaze estimation.
Abstract: In this paper we present Pupil -- an accessible, affordable, and extensible open source platform for pervasive eye tracking and gaze-based interaction Pupil comprises 1) a light-weight eye tracking headset, 2) an open source software framework for mobile eye tracking, as well as 3) a graphical user interface to playback and visualize video and gaze data Pupil features high-resolution scene and eye cameras for monocular and binocular gaze estimation The software and GUI are platform-independent and include state-of-the-art algorithms for real-time pupil detection and tracking, calibration, and accurate gaze estimation Results of a performance evaluation show that Pupil can provide an average gaze estimation accuracy of 06 degree of visual angle (008 degree precision) with a processing pipeline latency of only 0045 seconds

655 citations

Journal ArticleDOI
TL;DR: The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.
Abstract: In this work, we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data were recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals-saccades, fixations, and blinks-and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance (mRMR) feature selection. We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking handwritten notes, watching a video, and browsing the Web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and person-independent (leave-one-person-out) training, we obtain an average precision of 76.1 percent and recall of 70.5 percent over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.

619 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: An extensive evaluation of several state-of-the-art image-based gaze estimation algorithms on three current datasets, including the MPIIGaze dataset, which contains 213,659 images collected from 15 participants during natural everyday laptop use over more than three months.
Abstract: Appearance-based gaze estimation is believed to work well in real-world settings, but existing datasets have been collected under controlled laboratory conditions and methods have been not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing ones with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks that significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithms on three current datasets, including our own. This evaluation provides clear insights and allows us to identify key research challenges of gaze estimation in the wild.

493 citations


Cited by
More filters
01 Jan 2006

3,012 citations

Proceedings Article
01 Jan 1999

2,010 citations

Journal ArticleDOI
18 Jan 2016-Sensors
TL;DR: A generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which is suitable for multimodal wearable sensors, does not require expert knowledge in designing features, and explicitly models the temporal dynamics of feature activations is proposed.
Abstract: Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation.

1,896 citations

01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: SimGAN as mentioned in this paper uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors, and achieves state-of-the-art results on the MPIIGaze dataset without any labeled real data.
Abstract: With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulators output using unlabeled real data, while preserving the annotation information from the simulator. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training: (i) a self-regularization term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.

1,724 citations