Showing papers on "Eye tracking published in 2018"

PDF

Open Access

Book Chapter•DOI•

Distractor-aware Siamese Networks for Visual Object Tracking

[...]

Zheng Zhu¹, Qiang Wang¹, Bo Li², Wei Wu², Junjie Yan², Weiming Hu¹ - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, SenseTime²

08 Sep 2018

TL;DR: Zhang et al. as discussed by the authors proposed a distractor-aware Siamese network for accurate and long-term tracking, which uses an effective sampling strategy to control the distribution of training data and make the model focus on the semantic distractors.

...read moreread less

Abstract: Recently, Siamese networks have drawn great attention in visual tracking community because of their balanced accuracy and speed. However, features used in most Siamese tracking approaches can only discriminate foreground from the non-semantic backgrounds. The semantic backgrounds are always considered as distractors, which hinders the robustness of Siamese trackers. In this paper, we focus on learning distractor-aware Siamese networks for accurate and long-term tracking. To this end, features used in traditional Siamese trackers are analyzed at first. We observe that the imbalanced distribution of training data makes the learned features less discriminative. During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors. During inference, a novel distractor-aware module is designed to perform incremental learning, which can effectively transfer the general embedding to the current video domain. In addition, we extend the proposed approach for long-term tracking by introducing a simple yet effective local-to-global search region strategy. Extensive experiments on benchmarks show that our approach significantly outperforms the state-of-the-arts, yielding 9.6% relative gain in VOT2016 dataset and 35.9% relative gain in UAV20L dataset. The proposed tracker can perform at 160 FPS on short-term benchmarks and 110 FPS on long-term benchmarks.

...read moreread less

711 citations

Posted Content•

Distractor-aware Siamese Networks for Visual Object Tracking

[...]

Zheng Zhu¹, Qiang Wang¹, Bo Li², Wei Wu², Junjie Yan², Weiming Hu¹ - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, SenseTime²

18 Aug 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper focuses on learning distractor-aware Siamese networks for accurate and long-term tracking, and extends the proposed approach for long- term tracking by introducing a simple yet effective local-to-global search region strategy.

...read moreread less

644 citations

Journal Article•DOI•

Deep video portraits

[...]

Hyeongwoo Kim¹, Pablo Garrido, Ayush Tewari¹, Weipeng Xu¹, Justus Thies², Matthias Niessner², Patrick Pérez, Christian Richardt³, Michael Zollhöfer⁴, Christian Theobalt¹ - Show less +6 more•Institutions (4)

Max Planck Society¹, Technische Universität München², University of Bath³, Stanford University⁴

30 Jul 2018

TL;DR: In this paper, a generative neural network with a novel space-time architecture is proposed to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor.

...read moreread less

Abstract: We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input. In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network - thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background. For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing. To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.

...read moreread less

611 citations

Proceedings Article•DOI•

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking

[...]

Qiang Wang¹, Zhu Teng², Junliang Xing¹, Jin Gao¹, Weiming Hu¹, Stephen J. Maybank³ - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, Beijing Jiaotong University², Birkbeck, University of London³

18 Jun 2018

TL;DR: A Residual Attentional Siamese Network (RASNet) for high performance object tracking that mitigates the over-fitting problem in deep network training, but also enhances its discriminative capacity and adaptability due to the separation of representation learning and discriminator learning.

...read moreread less

Abstract: Offline training for object tracking has recently shown great potentials in balancing tracking accuracy and speed. However, it is still difficult to adapt an offline trained model to a target tracked online. This work presents a Residual Attentional Siamese Network (RASNet) for high performance object tracking. The RASNet model reformulates the correlation filter within a Siamese tracking framework, and introduces different kinds of the attention mechanisms to adapt the model without updating the model online. In particular, by exploiting the offline trained general attention, the target adapted residual attention, and the channel favored feature attention, the RASNet not only mitigates the over-fitting problem in deep network training, but also enhances its discriminative capacity and adaptability due to the separation of representation learning and discriminator learning. The proposed deep architecture is trained from end to end and takes full advantage of the rich spatial temporal information to achieve robust visual tracking. Experimental results on two latest benchmarks, OTB-2015 and VOT2017, show that the RASNet tracker has the state-of-the-art tracking accuracy while runs at more than 80 frames per second.

...read moreread less

499 citations

Journal Article•DOI•

Deep visual tracking: Review and experimental comparison

[...]

Peixia Li¹, Dong Wang¹, Lijun Wang¹, Huchuan Lu¹•Institutions (1)

Dalian University of Technology¹

01 Apr 2018-Pattern Recognition

TL;DR: The background of deep visual tracking is introduced, including the fundamental concepts of visual tracking and related deep learning algorithms, and the existing deep-learning-based trackers are categorize into three classes according to network structure, network function and network training.

...read moreread less

473 citations

Proceedings Article•DOI•

Multi-cue Correlation Filters for Robust Visual Tracking

[...]

Ning Wang¹, Wengang Zhou¹, Qi Tian², Richang Hong³, Meng Wang³, Houqiang Li¹ - Show less +2 more•Institutions (3)

University of Science and Technology of China¹, University of Texas at San Antonio², Hefei University of Technology³

18 Jun 2018

TL;DR: This paper proposes an efficient multi-cue analysis framework for robust visual tracking by combining different types of features, and constructs multiple experts through Discriminative Correlation Filter and each of them tracks the target independently.

...read moreread less

Abstract: In recent years, many tracking algorithms achieve impressive performance via fusing multiple types of features, however, most of them fail to fully explore the context among the adopted multiple features and the strength of them. In this paper, we propose an efficient multi-cue analysis framework for robust visual tracking. By combining different types of features, our approach constructs multiple experts through Discriminative Correlation Filter (DCF) and each of them tracks the target independently. With the proposed robustness evaluation strategy, the suitable expert is selected for tracking in each frame. Furthermore, the divergence of multiple experts reveals the reliability of the current tracking, which is quantified to update the experts adaptively to keep them from corruption. Through the proposed multi-cue analysis, our tracker with standard DCF and deep features achieves outstanding results on several challenging benchmarks: OTB-2013, OTB-2015, Temple-Color and VOT 2016. On the other hand, when evaluated with only simple hand-crafted features, our method demonstrates comparable performance amongst complex non-realtime trackers, but exhibits much better efficiency, with a speed of 45 FPS on a CPU.

...read moreread less

347 citations

Book Chapter•DOI•

Learning Dynamic Memory Networks for Object Tracking

[...]

Tianyu Yang¹, Antoni B. Chan¹•Institutions (1)

City University of Hong Kong¹

08 Sep 2018

TL;DR: In this paper, a dynamic memory network is proposed to adapt the template to the target's appearance variations during tracking, where an LSTM is used as a memory controller, where the input is the search feature map and the outputs are the control signals for the reading and writing process of the memory block.

...read moreread less

Abstract: Template-matching methods for visual tracking have gained popularity recently due to their comparable performance and fast speed. However, they lack effective ways to adapt to changes in the target object’s appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target’s appearance variations during tracking. An LSTM is used as a memory controller, where the input is the search feature map and the outputs are the control signals for the reading and writing process of the memory block. As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target. To prevent aggressive model adaptivity, we apply gated residual template learning to control the amount of retrieved memory that is used to combine with the initial template. Unlike tracking-by-detection methods where the object’s information is maintained by the weight parameters of neural networks, which requires expensive online fine-tuning to be adaptable, our tracker runs completely feed-forward and adapts to the target’s appearance changes by updating the external memory. Moreover, unlike other tracking methods where the model capacity is fixed after offline training – the capacity of our tracker can be easily enlarged as the memory requirements of a task increase, which is favorable for memorizing long-term object information. Extensive experiments on OTB and VOT demonstrates that our tracker MemTrack performs favorably against state-of-the-art tracking methods while retaining real-time speed of 50 fps.

...read moreread less

264 citations

Proceedings Article•DOI•

Communicating Awareness and Intent in Autonomous Vehicle-Pedestrian Interaction

[...]

Karthik Mahadevan¹, Sowmya Somanath², Ehud Sharlin¹•Institutions (2)

University of Calgary¹, University of Toronto²

21 Apr 2018

TL;DR: Interfaces communicating vehicle awareness and intent are found that can help pedestrians attempting to cross; are not limited to the vehicle and can exist in the environment; and should use a combination of modalities such as visual, auditory, and physical.

...read moreread less

Abstract: Drivers use nonverbal cues such as vehicle speed, eye gaze, and hand gestures to communicate awareness and intent to pedestrians. Conversely, in autonomous vehicles, drivers can be distracted or absent, leaving pedestrians to infer awareness and intent from the vehicle alone. In this paper, we investigate the usefulness of interfaces (beyond vehicle movement) that explicitly communicate awareness and intent of autonomous vehicles to pedestrians, focusing on crosswalk scenarios. We conducted a preliminary study to gain insight on designing interfaces that communicate autonomous vehicle awareness and intent to pedestrians. Based on study outcomes, we developed four prototype interfaces and deployed them in studies involving a Segway and a car. We found interfaces communicating vehicle awareness and intent: (1) can help pedestrians attempting to cross; (2) are not limited to the vehicle and can exist in the environment; and (3) should use a combination of modalities such as visual, auditory, and physical.

...read moreread less

240 citations

Book Chapter•DOI•

RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments

[...]

Tobias Fischer¹, Hyung Jin Chang¹, Yiannis Demiris¹•Institutions (1)

Imperial College London¹

08 Sep 2018

TL;DR: This work addresses the issue of ground truth annotation by measuring head pose using a motion capture system and eye gaze using mobile eyetracking glasses and applies semantic image inpainting to the area covered by the glasses to bridge the gap between training and testing images by removing the obtrusiveness of the glasses.

...read moreread less

Abstract: In this work, we consider the problem of robust gaze estimation in natural environments. Large camera-to-subject distances and high variations in head pose and eye gaze angles are common in such environments. This leads to two main shortfalls in state-of-the-art methods for gaze estimation: hindered ground truth gaze annotation and diminished gaze estimation accuracy as image resolution decreases with distance. We first record a novel dataset of varied gaze and head pose images in a natural environment, addressing the issue of ground truth annotation by measuring head pose using a motion capture system and eye gaze using mobile eyetracking glasses. We apply semantic image inpainting to the area covered by the glasses to bridge the gap between training and testing images by removing the obtrusiveness of the glasses. We also present a new real-time algorithm involving appearance-based deep convolutional neural networks with increased capacity to cope with the diverse images in the new dataset. Experiments with this network architecture are conducted on a number of diverse eye-gaze datasets including our own, and in cross dataset evaluations. We demonstrate state-of-the-art performance in terms of estimation accuracy in all experiments, and the architecture performs well even on lower resolution images.

...read moreread less

233 citations

Journal Article•DOI•

A systematic review of eye tracking research on multimedia learning

[...]

Ecenaz Alemdag¹, Kursat Cagiltay¹•Institutions (1)

Middle East Technical University¹

01 Oct 2018-Computers in Education

TL;DR: A systematic review of eye tracking research in the domain of multimedia learning explores how cognitive processes in multimedia learning are studied with relevant variables through eye tracking technology to offer suggestions for future research and practices.

...read moreread less

Abstract: This study provides a current systematic review of eye tracking research in the domain of multimedia learning. The particular aim of the review is to explore how cognitive processes in multimedia learning are studied with relevant variables through eye tracking technology. To this end, 52 articles, including 58 studies, were analyzed. Remarkable results are that (1) there is a burgeoning interest in the use of eye tracking technology in multimedia learning research; (2) studies were mostly conducted with college students, science materials, and the temporal and count scales of eye tracking measurements; (3) eye movement measurements provided inferences about the cognitive processes of selecting, organizing, and integrating; (4) multimedia learning principles, multimedia content, individual differences, metacognition, and emotions were the potential factors that can affect eye movement measurements; and (5) findings were available for supporting the association between cognitive processes inferred by eye tracking measurements and learning performance. Specific gaps in the literature and implications of existing findings on multimedia learning design were also determined to offer suggestions for future research and practices.

...read moreread less

201 citations

Proceedings Article•DOI•

Gaze Prediction in Dynamic 360° Immersive Videos

[...]

Yanyu Xu¹, Yanbing Dong¹, Junru Wu¹, Zhengzhong Sun¹, Zhiru Shi¹, Jingyi Yu¹, Shenghua Gao¹ - Show less +3 more•Institutions (1)

ShanghaiTech University¹

18 Jun 2018

TL;DR: This paper presents the large-scale eye-tracking in dynamic VR scene dataset, and proposes to compute saliency maps at different spatial scales: the sub-image patch centered at current gaze point, theSub-image corresponding to the Field of View (FoV), and the panorama image.

...read moreread less

Abstract: This paper explores gaze prediction in dynamic 360° immersive videos, i.e., based on the history scan path and VR contents, we predict where a viewer will look at an upcoming time. To tackle this problem, we first present the large-scale eye-tracking in dynamic VR scene dataset. Our dataset contains 208 360° videos captured in dynamic scenes, and each video is viewed by at least 31 subjects. Our analysis shows that gaze prediction depends on its history scan path and image contents. In terms of the image contents, those salient objects easily attract viewers' attention. On the one hand, the saliency is related to both appearance and motion of the objects. Considering that the saliency measured at different scales is different, we propose to compute saliency maps at different spatial scales: the sub-image patch centered at current gaze point, the sub-image corresponding to the Field of View (FoV), and the panorama image. Then we feed both the saliency maps and the corresponding images into a Convolutional Neural Network (CNN) for feature extraction. Meanwhile, we also use a Long-Short-Term-Memory (LSTM) to encode the history scan path. Then we combine the CNN features and LSTM features for gaze displacement prediction between gaze point at a current time and gaze point at an upcoming time. Extensive experiments validate the effectiveness of our method for gaze prediction in dynamic VR scenes.

...read moreread less

Proceedings Article•DOI•

Pinpointing: Precise Head- and Eye-Based Target Selection for Augmented Reality

[...]

Mikko Kytö¹, Barrett Ens², Thammathip Piumsomboon², Gun A. Lee², Mark Billinghurst² - Show less +1 more•Institutions (2)

Aalto University¹, University of South Australia²

19 Apr 2018

TL;DR: This work investigates precise, multimodal selection techniques using head motion and eye gaze for augmented reality applications, including compact menus with deep structure, and a proof-of-concept method for on-line correction of calibration drift.

...read moreread less

Abstract: Head and eye movement can be leveraged to improve the user's interaction repertoire for wearable displays. Head movements are deliberate and accurate, and provide the current state-of-the-art pointing technique. Eye gaze can potentially be faster and more ergonomic, but suffers from low accuracy due to calibration errors and drift of wearable eye-tracking sensors. This work investigates precise, multimodal selection techniques using head motion and eye gaze. A comparison of speed and pointing accuracy reveals the relative merits of each method, including the achievable target size for robust selection. We demonstrate and discuss example applications for augmented reality, including compact menus with deep structure, and a proof-of-concept method for on-line correction of calibration drift.

...read moreread less

Journal Article•DOI•

Eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze

[...]

Krzysztof Krejtz¹, Andrew T. Duchowski², Anna Niedzielska¹, Cezary Biele, Izabela Krejtz¹ - Show less +1 more•Institutions (2)

University of Social Sciences and Humanities¹, Clemson University²

14 Sep 2018-PLOS ONE

TL;DR: Inter-trial change in pupil diameter and microsaccade magnitude appear to adequately discriminate task difficulty, and hence cognitive load, if the implied causality can be assumed and the reliability and sensitivity of task-evoked pupillary andmicrosaccadic measures of cognitive load are compared.

...read moreread less

Abstract: Pupil diameter and microsaccades are captured by an eye tracker and compared for their suitability as indicators of cognitive load (as beset by task difficulty). Specifically, two metrics are tested in response to task difficulty: (1) the change in pupil diameter with respect to inter- or intra-trial baseline, and (2) the rate and magnitude of microsaccades. Participants performed easy and difficult mental arithmetic tasks while fixating a central target. Inter-trial change in pupil diameter and microsaccade magnitude appear to adequately discriminate task difficulty, and hence cognitive load, if the implied causality can be assumed. This paper’s contribution corroborates previous work concerning microsaccade magnitude and extends this work by directly comparing microsaccade metrics to pupillometric measures. To our knowledge this is the first study to compare the reliability and sensitivity of task-evoked pupillary and microsaccadic measures of cognitive load.

...read moreread less

Journal Article•DOI•

Good Features to Correlate for Visual Tracking

[...]

Erhan Gundogdu¹, A. Aydin Alatan²•Institutions (2)

ASELSAN¹, Middle East Technical University²

14 Feb 2018-IEEE Transactions on Image Processing

TL;DR: In this paper, the problem of learning deep fully convolutional features for the CFB visual tracking is formulated and a novel and efficient backpropagation algorithm is presented based on the loss function of the network.

...read moreread less

Abstract: During the recent years, correlation filters have shown dominant and spectacular results for visual object tracking. The types of the features that are employed in this family of trackers significantly affect the performance of visual tracking. The ultimate goal is to utilize the robust features invariant to any kind of appearance change of the object, while predicting the object location as properly as in the case of no appearance change. As the deep learning based methods have emerged, the study of learning features for specific tasks has accelerated. For instance, discriminative visual tracking methods based on deep architectures have been studied with promising performance. Nevertheless, correlation filter based (CFB) trackers confine themselves to use the pre-trained networks, which are trained for object classification problem. To this end, in this manuscript the problem of learning deep fully convolutional features for the CFB visual tracking is formulated. In order to learn the proposed model, a novel and efficient backpropagation algorithm is presented based on the loss function of the network. The proposed learning framework enables the network model to be flexible for a custom design. Moreover, it alleviates the dependency on the network trained for classification. Extensive performance analysis shows the efficacy of the proposed custom design in the CFB tracking framework. By fine-tuning the convolutional parts of a state-of-the-art network and integrating this model to a CFB tracker, which is the top performing one of VOT2016, 18% increase is achieved in terms of expected average overlap, and tracking failures are decreased by 25%, while maintaining the superiority over the state-of-the-art methods in OTB-2013 and OTB-2015 tracking datasets.

...read moreread less

Proceedings Article•DOI•

A dataset of head and eye movements for 360° videos

[...]

Erwan J. David¹, Jesus Gutierrez¹, Antoine Coutrot¹, Matthieu Perreira Da Silva¹, Patrick Le Callet¹ - Show less +1 more•Institutions (1)

University of Nantes¹

12 Jun 2018

TL;DR: This paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to the previous dataset for still images and its associated code is made publicly available to support research on visual attention for 360° content.

...read moreread less

Abstract: Research on visual attention in 360° content is crucial to understand how people perceive and interact with this immersive type of content and to develop efficient techniques for processing, encoding, delivering and rendering. And also to offer a high quality of experience to end users. The availability of public datasets is essential to support and facilitate research activities of the community. Recently, some studies have been presented analyzing exploration behaviors of people watching 360° videos, and a few datasets have been published. However, the majority of these works only consider head movements as proxy for gaze data, despite the importance of eye movements in the exploration of omnidirectional content. Thus, this paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to our previous dataset for still images [14]. Head and eye tracking data was obtained from 57 participants during a free-viewing experiment with 19 videos. In addition, guidelines on how to obtain saliency maps and scanpaths from raw data are provided. Also, some statistics related to exploration behaviors are presented, such as the impact of the longitudinal starting position when watching omnidirectional videos was investigated in this test. This dataset and its associated code are made publicly available to support research on visual attention for 360° content.

...read moreread less

Proceedings Article•DOI•

SINT++: Robust Visual Tracking via Adversarial Positive Instance Generation

[...]

Xiao Wang¹, Chenglong Li¹, Bin Luo¹, Jin Tang¹•Institutions (1)

Anhui University¹

01 Jun 2018

TL;DR: The positive samples generation network (PSGN) is introduced to sampling massive diverse training data through traversing over the constructed target object manifold and generated diverse target object images can enrich the training dataset and enhance the robustness of visual trackers.

...read moreread less

Abstract: Existing visual trackers are easily disturbed by occlusion, blur and large deformation. We think the performance of existing visual trackers may be limited due to the following issues: i) Adopting the dense sampling strategy to generate positive examples will make them less diverse; ii) The training data with different challenging factors are limited, even through collecting large training dataset. Collecting even larger training dataset is the most intuitive paradigm, but it may still can not cover all situations and the positive samples are still monotonous. In this paper, we propose to generate hard positive samples via adversarial learning for visual tracking. Specifically speaking, we assume the target objects all lie on a manifold, hence, we introduce the positive samples generation network (PSGN) to sampling massive diverse training data through traversing over the constructed target object manifold. The generated diverse target object images can enrich the training dataset and enhance the robustness of visual trackers. To make the tracker more robust to occlusion, we adopt the hard positive transformation network (HPTN) which can generate hard samples for tracking algorithm to recognize. We train this network with deep reinforcement learning to automatically occlude the target object with a negative patch. Based on the generated hard positive samples, we train a Siamese network for visual tracking and our experiments validate the effectiveness of the introduced algorithm. The project page of this paper can be found from the website1.

...read moreread less

Journal Article•DOI•

Eye Movements During Everyday Behavior Predict Personality Traits.

[...]

Sabrina Hoppe¹, Tobias Loetscher², Stephanie A. Morey³, Andreas Bulling⁴•Institutions (4)

University of Stuttgart¹, University of South Australia², Flinders University³, Max Planck Society⁴

13 Apr 2018-Frontiers in Human Neuroscience

TL;DR: It is shown that eye movements during an everyday task predict aspects of the authors' personality, and new relations between previously neglected eye movement characteristics and personality are revealed.

...read moreread less

Abstract: Besides allowing us to perceive our surroundings, eye movements are also a window into our mind and a rich source of information on who we are, how we feel, and what we do. Here we show that eye movements during an everyday task predict aspects of our personality. We tracked eye movements of 42 participants while they ran an errand on a university campus and subsequently assessed their personality traits using well-established questionnaires. Using a state-of-the-art machine learning method and a rich set of features encoding different eye movement characteristics, we were able to reliably predict four of the Big Five personality traits (neuroticism, extraversion, agreeableness, conscientiousness) as well as perceptual curiosity only from eye movements. Further analysis revealed new relations between previously neglected eye movement characteristics and personality. Our findings demonstrate a considerable influence of personality on everyday eye movement control, thereby complementing earlier studies in laboratory settings. Improving automatic recognition and interpretation of human social signals is an important endeavor, enabling innovative design of human–computer systems capable of sensing spontaneous natural user behavior to facilitate efficient interaction and personalization.

...read moreread less

Book Chapter•DOI•

Deep Pictorial Gaze Estimation

[...]

Seonwook Park¹, Adrian Spurr¹, Otmar Hilliges¹•Institutions (1)

ETH Zurich¹

08 Sep 2018

TL;DR: In this paper, instead of directly regressing two angles for the pitch and yaw of the eyeball, they regress to an intermediate pictorial representation which in turn simplifies the task of 3D gaze direction estimation.

...read moreread less

Abstract: Estimating human gaze from natural eye images only is a challenging task. Gaze direction can be defined by the pupil- and the eyeball center where the latter is unobservable in 2D images. Hence, achieving highly accurate gaze estimates is an ill-posed problem. In this paper, we introduce a novel deep neural network architecture specifically designed for the task of gaze estimation from single eye input. Instead of directly regressing two angles for the pitch and yaw of the eyeball, we regress to an intermediate pictorial representation which in turn simplifies the task of 3D gaze direction estimation. Our quantitative and qualitative results show that our approach achieves higher accuracies than the state-of-the-art and is robust to variation in gaze, head pose and image quality.

...read moreread less

Journal Article•DOI•

Multimodal Depression Detection: Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors

[...]

Sharifa Alghowinem¹, Roland Goecke², Michael Wagner², Julien Epps³, Matthew P. Hyett³, Gordon Parker³, Michael Breakspear⁴ - Show less +3 more•Institutions (4)

Prince Sultan University¹, University of Canberra², University of New South Wales³, QIMR Berghofer Medical Research Institute⁴

01 Oct 2018-IEEE Transactions on Affective Computing

TL;DR: Using statistical features extracted from speaking behaviour, eye activity, and head pose, the behaviour associated with major depression is characterised and the performance of the classification of individual modalities and when fused is examined.

...read moreread less

Abstract: An estimated 350 million people worldwide are affected by depression. Using affective sensing technology, our long-term goal is to develop an objective multimodal system that augments clinical opinion during the diagnosis and monitoring of clinical depression. This paper steps towards developing a classification system-oriented approach, where feature selection, classification and fusion-based experiments are conducted to infer which types of behaviour (verbal and nonverbal) and behaviour combinations can best discriminate between depression and non-depression. Using statistical features extracted from speaking behaviour, eye activity, and head pose, we characterise the behaviour associated with major depression and examine the performance of the classification of individual modalities and when fused. Using a real-world, clinically validated dataset of 30 severely depressed patients and 30 healthy control subjects, a Support Vector Machine is used for classification with several feature selection techniques. Given the statistical nature of the extracted features, feature selection based on T-tests performed better than other methods. Individual modality classification results were considerably higher than chance level (83 percent for speech, 73 percent for eye, and 63 percent for head). Fusing all modalities shows a remarkable improvement compared to unimodal systems, which demonstrates the complementary nature of the modalities. Among the different fusion approaches used here, feature fusion performed best with up to 88 percent average accuracy. We believe that is due to the compatible nature of the extracted statistical features.

...read moreread less

Posted Content•

Deep Attentive Tracking via Reciprocative Learning

[...]

Shi Pu¹, Yibing Song², Chao Ma³, Honggang Zhang¹, Ming-Hsuan Yang⁴ - Show less +1 more•Institutions (4)

Beijing University of Posts and Telecommunications¹, Tencent², Shanghai Jiao Tong University³, University of California, Merced⁴

09 Oct 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a reciprocative learning algorithm was proposed to exploit visual attention for training deep classifiers, which consists of feed-forward and backward operations to generate attention maps, which serve as regularization terms coupled with the original classification loss function for training.

...read moreread less

Abstract: Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems For visual tracking, it is often challenging to track target objects undergoing large appearance changes Attention maps facilitate visual tracking by selectively paying attention to temporal robust features Existing tracking-by-detection approaches mainly use additional attention modules to generate feature weights as the classifiers are not equipped with such mechanisms In this paper, we propose a reciprocative learning algorithm to exploit visual attention for training deep classifiers The proposed algorithm consists of feed-forward and backward operations to generate attention maps, which serve as regularization terms coupled with the original classification loss function for training The deep classifier learns to attend to the regions of target objects robust to appearance changes Extensive experiments on large-scale benchmark datasets show that the proposed attentive tracking method performs favorably against the state-of-the-art approaches

...read moreread less

Journal Article•DOI•

Using machine learning to detect events in eye-tracking data

[...]

Raimondas Zemblys¹, Raimondas Zemblys², Diederick C Niehorster³, Diederick C Niehorster², Oleg V. Komogortsev⁴, Kenneth Holmqvist² - Show less +2 more•Institutions (4)

Šiauliai University¹, Lund University², University of Münster³, Texas State University⁴

01 Feb 2018-Behavior Research Methods

TL;DR: It is shown that a fully automated classification of raw gaze samples as belonging to fixations, saccades, or other oculomotor events can be achieved using a machine-learning approach and lead to superior detection compared to current state-of-the-art event detection algorithms.

...read moreread less

Abstract: Event detection is a challenging stage in eye movement data analysis. A major drawback of current event detection methods is that parameters have to be adjusted based on eye movement data quality. Here we show that a fully automated classification of raw gaze samples as belonging to fixations, saccades, or other oculomotor events can be achieved using a machine-learning approach. Any already manually or algorithmically detected events can be used to train a classifier to produce similar classification of other data without the need for a user to set parameters. In this study, we explore the application of random forest machine-learning technique for the detection of fixations, saccades, and post-saccadic oscillations (PSOs). In an effort to show practical utility of the proposed method to the applications that employ eye movement classification algorithms, we provide an example where the method is employed in an eye movement-driven biometric application. We conclude that machine-learning techniques lead to superior detection compared to current state-of-the-art event detection algorithms and can reach the performance of manual coding.

...read moreread less

Journal Article•DOI•

Re3 : Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects

[...]

Daniel Gordon¹, Ali Farhadi², Dieter Fox¹•Institutions (2)

University of Washington¹, Allen Institute for Artificial Intelligence²

11 Jan 2018

TL;DR: In this paper, a real-time deep object tracker capable of incorporating temporal information into its model is presented. But this tracker requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time.

...read moreread less

Abstract: Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time. A tracker must be able to modify its underlying model and adapt to new observations. We present Re 3 , a real-time deep object tracker capable of incorporating temporal information into its model. Rather than focusing on a limited set of objects or training a model at test-time to track a specific instance, we pretrain our generic tracker on a large variety of objects and efficiently update on the fly; Re 3 simultaneously tracks and updates the appearance model with a single forward pass. This lightweight model is capable of tracking objects at 150 FPS while attaining competitive results on challenging benchmarks. We also show that our method handles temporary occlusion better than other comparable trackers using experiments that directly measure performance on sequences with occlusion.

...read moreread less

Journal Article•DOI•

FaceVR: Real-Time Gaze-Aware Facial Reenactment in Virtual Reality

[...]

Justus Thies¹, Michael Zollhöfer², Marc Stamminger³, Christian Theobalt⁴, Matthias Nießner¹ - Show less +1 more•Institutions (4)

Technische Universität München¹, Stanford University², University of Erlangen-Nuremberg³, Max Planck Society⁴

29 Jun 2018-ACM Transactions on Graphics

TL;DR: The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD), as well as a new data-driven approach for eye tracking from monocular videos.

...read moreread less

Abstract: We propose FaceVR, a novel image-based method that enables video teleconferencing in VR based on self-reenactment. State-of-the-art face tracking methods in the VR context are focused on the animation of rigged 3D avatars (Li et al. 2015; Olszewski et al. 2016). Although they achieve good tracking performance, the results look cartoonish and not real. In contrast to these model-based approaches, FaceVR enables VR teleconferencing using an image-based technique that results in nearly photo-realistic outputs. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD), as well as a new data-driven approach for eye tracking from monocular videos. Based on reenactment of a prerecorded stereo video of the person without the HMD, FaceVR incorporates photo-realistic re-rendering in real time, thus allowing artificial modifications of face and eye appearances. For instance, we can alter facial expressions or change gaze directions in the prerecorded target video. In a live setup, we apply these newly introduced algorithmic components.

...read moreread less

Journal Article•DOI•

What to expect from your remote eye-tracker when participants are unrestrained.

[...]

Diederick C Niehorster¹, Diederick C Niehorster², Tim Cornelissen³, Kenneth Holmqvist¹, Kenneth Holmqvist⁴, Ignace T. C. Hooge⁵, Roy S. Hessels⁵ - Show less +3 more•Institutions (5)

Lund University¹, University of Münster², Goethe University Frankfurt³, North-West University⁴, Utrecht University⁵

01 Feb 2018-Behavior Research Methods

TL;DR: This study provides practical insight into how popular remote eye-trackers perform when recording from unrestrained participants and provides a testing method for evaluating whether a tracker is suitable for studying a certain target population, and that manufacturers can use during the development of new eye- Trackers.

...read moreread less

Abstract: The marketing materials of remote eye-trackers suggest that data quality is invariant to the position and orientation of the participant as long as the eyes of the participant are within the eye-tracker's headbox, the area where tracking is possible. As such, remote eye-trackers are marketed as allowing the reliable recording of gaze from participant groups that cannot be restrained, such as infants, schoolchildren and patients with muscular or brain disorders. Practical experience and previous research, however, tells us that eye-tracking data quality, e.g. the accuracy of the recorded gaze position and the amount of data loss, deteriorates (compared to well-trained participants in chinrests) when the participant is unrestrained and assumes a non-optimal pose in front of the eye-tracker. How then can researchers working with unrestrained participants choose an eye-tracker? Here we investigated the performance of five popular remote eye-trackers from EyeTribe, SMI, SR Research, and Tobii in a series of tasks where participants took on non-optimal poses. We report that the tested systems varied in the amount of data loss and systematic offsets observed during our tasks. The EyeLink and EyeTribe in particular had large problems. Furthermore, the Tobii eye-trackers reported data for two eyes when only one eye was visible to the eye-tracker. This study provides practical insight into how popular remote eye-trackers perform when recording from unrestrained participants. It furthermore provides a testing method for evaluating whether a tracker is suitable for studying a certain target population, and that manufacturers can use during the development of new eye-trackers.

...read moreread less

Proceedings Article•

Deep Attentive Tracking via Reciprocative Learning

[...]

Shi Pu¹, Yibing Song², Chao Ma³, Honggang Zhang¹, Ming-Hsuan Yang⁴ - Show less +1 more•Institutions (4)

Beijing University of Posts and Telecommunications¹, Tencent², Shanghai Jiao Tong University³, University of California, Merced⁴

09 Oct 2018

TL;DR: In this paper, a reciprocative learning algorithm was proposed to exploit visual attention for training deep classifiers, which consists of feed-forward and backward operations to generate attention maps, which serve as regularization terms coupled with the original classification loss function for training.

...read moreread less

Abstract: Visual attention, derived from cognitive neuroscience, facilitates human perception on the most pertinent subset of the sensory data. Recently, significant efforts have been made to exploit attention schemes to advance computer vision systems. For visual tracking, it is often challenging to track target objects undergoing large appearance changes. Attention maps facilitate visual tracking by selectively paying attention to temporal robust features. Existing tracking-by-detection approaches mainly use additional attention modules to generate feature weights as the classifiers are not equipped with such mechanisms. In this paper, we propose a reciprocative learning algorithm to exploit visual attention for training deep classifiers. The proposed algorithm consists of feed-forward and backward operations to generate attention maps, which serve as regularization terms coupled with the original classification loss function for training. The deep classifier learns to attend to the regions of target objects robust to appearance changes. Extensive experiments on large-scale benchmark datasets show that the proposed attentive tracking method performs favorably against the state-of-the-art approaches.

...read moreread less

Book Chapter•DOI•

Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression

[...]

Yihua Cheng¹, Feng Lu¹, Xucong Zhang²•Institutions (2)

Beihang University¹, Max Planck Society²

08 Sep 2018

TL;DR: This paper proposes the Asymmetric Regression-Evaluation Network (ARE-Net), and tries to improve the gaze estimation performance to its full extent, with promising results and surpasses the state-of-the-art methods on multiple public datasets.

...read moreread less

Abstract: Eye gaze estimation has been increasingly demanded by recent intelligent systems to accomplish a range of interaction-related tasks, by using simple eye images as input. However, learning the highly complex regression between eye images and gaze directions is nontrivial, and thus the problem is yet to be solved efficiently. In this paper, we propose the Asymmetric Regression-Evaluation Network (ARE-Net), and try to improve the gaze estimation performance to its full extent. At the core of our method is the notion of “two eye asymmetry” observed during gaze estimation for the left and right eyes. Inspired by this, we design the multi-stream ARE-Net; one asymmetric regression network (AR-Net) predicts 3D gaze directions for both eyes with a novel asymmetric strategy, and the evaluation network (E-Net) adaptively adjusts the strategy by evaluating the two eyes in terms of their performance during optimization. By training the whole network, our method achieves promising results and surpasses the state-of-the-art methods on multiple public datasets.

...read moreread less

Journal Article•DOI•

Is the eye-movement field confused about fixations and saccades? A survey among 124 researchers.

[...]

Roy S. Hessels¹, Diederick C Niehorster², Marcus Nyström², Richard Andersson, Ignace T. C. Hooge¹ - Show less +1 more•Institutions (2)

Utrecht University¹, Lund University²

29 Aug 2018-Royal Society Open Science

TL;DR: Researchers are urged to make their definitions of fixations and saccades more explicit by specifying all the relevant components of the eye movement under investigation, to enable eye-movement researchers from different fields to have a discussion without misunderstandings.

...read moreread less

Abstract: Eye movements have been extensively studied in a wide range of research fields While new methods such as mobile eye tracking and eye tracking in virtual/augmented realities are emerging quickly, the eye-movement terminology has scarcely been revised We assert that this may cause confusion about two of the main concepts: fixations and saccades In this study, we assessed the definitions of fixations and saccades held in the eye-movement field, by surveying 124 eye-movement researchers These eye-movement researchers held a variety of definitions of fixations and saccades, of which the breadth seems even wider than what is reported in the literature Moreover, these definitions did not seem to be related to researcher background or experience We urge researchers to make their definitions more explicit by specifying all the relevant components of the eye movement under investigation: (i) the oculomotor component: eg whether the eye moves slow or fast; (ii) the functional component: what purposes does the eye movement (or lack thereof) serve; (iii) the coordinate system used: relative to what does the eye move; (iv) the computational definition: how is the event represented in the eye-tracker signal This should enable eye-movement researchers from different fields to have a discussion without misunderstandings

...read moreread less

Journal Article•DOI•

Headon: real-time reenactment of human portrait videos

[...]

Justus Thies¹, Michael Zollhöfer², Christian Theobalt³, Marc Stamminger⁴, Matthias Niessner¹ - Show less +1 more•Institutions (4)

Technische Universität München¹, Stanford University², Max Planck Society³, University of Erlangen-Nuremberg⁴

30 Jul 2018-ACM Transactions on Graphics

TL;DR: In this paper, a real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze is proposed.

...read moreread less

Abstract: We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel realtime reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.

...read moreread less

Proceedings Article•DOI•

Advantages of eye-gaze over head-gaze-based selection in virtual and augmented reality under varying field of views

[...]

Jonas Blattgerste¹, Patrick Renner¹, Thies Pfeiffer¹•Institutions (1)

Bielefeld University¹

15 Jun 2018

TL;DR: It is shown that eye- gaze outperforms head-gaze in terms of speed, task load, required head movement and user preference, and that the advantages of eye-gazes further increase with larger FOV sizes.

...read moreread less

Abstract: The current best practice for hands-free selection using Virtual and Augmented Reality (VR/AR) head-mounted displays is to use head-gaze for aiming and dwell-time or clicking for triggering the selection. There is an observable trend for new VR and AR devices to come with integrated eye-tracking units to improve rendering, to provide means for attention analysis or for social interactions. Eye-gaze has been successfully used for human-computer interaction in other domains, primarily on desktop computers. In VR/AR systems, aiming via eye-gaze could be significantly faster and less exhausting than via head-gaze.To evaluate benefits of eye-gaze-based interaction methods in VR and AR, we compared aiming via head-gaze and aiming via eye-gaze. We show that eye-gaze outperforms head-gaze in terms of speed, task load, required head movement and user preference. We furthermore show that the advantages of eye-gaze further increase with larger FOV sizes.

...read moreread less

Journal Article•DOI•

Using dual eye tracking to uncover personal gaze patterns during social interaction.

[...]

Shane L. Rogers¹, Craig P. Speelman¹, Oliver Guidetti¹, Melissa Longmuir¹•Institutions (1)

Edith Cowan University¹

09 Mar 2018-Scientific Reports

TL;DR: It is argued that the subjective perception of eye contact is a product of mutual face gaze instead of actual mutual eye contact, and the existence of an eye-mouth gaze continuum is suggested.

...read moreread less

Abstract: We report the personal eye gaze patterns of people engaged in face-to-face getting acquainted conversation. Considerable differences between individuals are underscored by a stability of eye gaze patterns within individuals. Results suggest the existence of an eye-mouth gaze continuum. This continuum includes some people showing a strong preference for eye gaze, some with a strong preference for mouth gaze, and others distributing their gaze between the eyes and mouth to varying extents. Additionally, we found evidence of within-participant consistency not just for location preference but also for the duration of fixations upon the eye and mouth regions. We also estimate that during a 4-minute getting acquainted conversation mutual face gaze constitutes about 60% of conversation that occurs via typically brief instances of 2.2 seconds. Mutual eye contact ranged from 0–45% of conversation, via very brief instances. This was despite participants subjectively perceiving eye contact occurring for about 70% of conversation. We argue that the subjective perception of eye contact is a product of mutual face gaze instead of actual mutual eye contact. We also outline the fast activity of gaze movements upon various locations both on and off face during a typical face-to-face conversation.

...read moreread less

Collapse