scispace - formally typeset
Search or ask a question

Showing papers on "Eye tracking published in 2016"


Posted Content
TL;DR: iTracker, a convolutional neural network for eye tracking, is trained, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device.
Abstract: From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2.5M frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device. Our model achieves a prediction error of 1.71cm and 2.53cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.34cm and 2.12cm. Further, we demonstrate that the features learned by iTracker generalize well to other datasets, achieving state-of-the-art results. The code, data, and models are available at this http URL.

535 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: Gaze Capture as mentioned in this paper is the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2:5M frames and trained iTracker, a convolutional neural network, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device.
Abstract: From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2:5M frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10–15fps) on a modern mobile device. Our model achieves a prediction error of 1.71cm and 2.53cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.34cm and 2.12cm. Further, we demonstrate that the features learned by iTracker generalize well to other datasets, achieving state-of-the-art results. The code, data, and models are available at http://gazecapture.csail.mit.edu.

473 citations


Journal ArticleDOI
TL;DR: This paper presents an efficient and very robust tracking algorithm using a single convolutional neural network for learning effective feature representations of the target object in a purely online manner and introduces a novel truncated structural loss function that maintains as many training samples as possible and reduces the risk of tracking error accumulation.
Abstract: Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking, because they require very long training time and a large number of training samples. In this paper, we present an efficient and very robust tracking algorithm using a single convolutional neural network (CNN) for learning effective feature representations of the target object in a purely online manner. Our contributions are multifold. First, we introduce a novel truncated structural loss function that maintains as many training samples as possible and reduces the risk of tracking error accumulation. Second, we enhance the ordinary stochastic gradient descent approach in CNN training with a robust sample selection mechanism. The sampling mechanism randomly generates positive and negative samples from different temporal distributions, which are generated by taking the temporal relations and label noise into account. Finally, a lazy yet effective updating scheme is designed for CNN training. Equipped with this novel updating algorithm, the CNN model is robust to some long-existing difficulties in visual tracking, such as occlusion or incorrect detections, without loss of the effective adaption for significant appearance changes. In the experiment, our CNN tracker outperforms all compared state-of-the-art methods on two recently proposed benchmarks, which in total involve over 60 video sequences. The remarkable performance improvement over the existing trackers illustrates the superiority of the feature representations, which are learned purely online via the proposed deep learning framework.

229 citations


Journal ArticleDOI
TL;DR: There is little evidence to suggest that individuals with autism do perceive faces holistically, and the eye avoidance hypothesis provides a plausible explanation of face recognition deficits where individuals with ASD avoid the eye region because it is perceived as socially threatening.
Abstract: Although a growing body of research indicates that children with autism spectrum disorder (ASD) exhibit selective deficits in their ability to recognize facial identities and expressions, the source of their face impairment is, as yet, undetermined. In this paper, we consider three possible accounts of the autism face deficit: (1) the holistic hypothesis, (2) the local perceptual bias hypothesis and (3) the eye avoidance hypothesis. A review of the literature indicates that contrary to the holistic hypothesis, there is little evidence to suggest that individuals with autism do perceive faces holistically. The local perceptual bias account also fails to explain the selective advantage that ASD individuals demonstrate for objects and their selective disadvantage for faces. The eye avoidance hypothesis provides a plausible explanation of face recognition deficits where individuals with ASD avoid the eye region because it is perceived as socially threatening. Direct eye contact elicits a increased physiological response as indicated by heightened skin conductance and amygdala activity. For individuals with autism, avoiding the eyes is an adaptive strategy, however, this approach interferes with the ability to process facial cues of identity, expressions and intentions, exacerbating the social challenges for persons with ASD.

215 citations


Journal ArticleDOI
TL;DR: In this article, the authors revisited the popular performance measures and tracker performance visualizations and analyzed them theoretically and experimentally, and showed that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others.
Abstract: The problem of visual tracking evaluation is sporting a large variety of performance measures, and largely suffers from lack of consensus about which measures should be used in experiments. This makes the cross-paper tracker comparison difficult. Furthermore, as some measures may be less effective than others, the tracking results may be skewed or biased toward particular tracking aspects. In this paper, we revisit the popular performance measures and tracker performance visualizations and analyze them theoretically and experimentally. We show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on our analysis, we narrow down the set of potential measures to only two complementary ones, describing accuracy and robustness, thus pushing toward homogenization of the tracker evaluation methodology. These two measures can be intuitively interpreted and visualized and have been employed by the recent visual object tracking challenges as the foundation for the evaluation methodology.

207 citations


Posted Content
TL;DR: This work proposes an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input, and encodes the face image using a convolutional neural network with spatial weights applied on the feature maps to flexibly suppress or enhance information in different facial regions.
Abstract: Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps to flexibly suppress or enhance information in different facial regions. Through extensive evaluation, we show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3% on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We further show that this improvement is consistent across different illumination conditions and gaze directions and particularly pronounced for the most challenging extreme head poses.

204 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: A novel attention-modulated visual tracking algorithm that decomposes an object into multiple cognitive units, and trains multiple elementary trackers in order to modulate the distribution of attention according to various feature and kernel types is presented.
Abstract: In this paper, we present a novel attention-modulated visual tracking algorithm that decomposes an object into multiple cognitive units, and trains multiple elementary trackers in order to modulate the distribution of attention according to various feature and kernel types. In the integration stage it recombines the units to memorize and recognize the target object effectively. With respect to the elementary trackers, we present a novel attentional feature-based correlation filter (AtCF) that focuses on distinctive attentional features. The effectiveness of the proposed algorithm is validated through experimental comparison with state-of-theart methods on widely-used tracking benchmark datasets.

201 citations


Proceedings Article
09 Jul 2016
TL;DR: The findings show that WebGazer can learn from user interactions and that its accuracy is sufficient for approximating the user's gaze.
Abstract: We introduce WebGazer, an online eye tracker that uses common webcams already present in laptops and mobile devices to infer the eye-gaze locations of web visitors on a page in real time. The eye tracking model self-calibrates by watching web visitors interact with the web page and trains a mapping between features of the eye and positions on the screen. This approach aims to provide a natural experience to everyday users that is not restricted to laboratories and highly controlled user studies. WebGazer has two key components: a pupil detector that can be combined with any eye detection library, and a gaze estimator using regression analysis informed by user interactions. We perform a large remote online study and a small in-person study to evaluate WebGazer. The findings show that WebGazer can learn from user interactions and that its accuracy is sufficient for approximating the user's gaze. As part of this paper, we release the first eye tracking library that can be easily integrated in any website for real-time gaze interactions, usability studies, or web research.

192 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: A novel tracking method called Recurrently Target-attending Tracking (RTT), which attempts to identify and exploit those reliable parts which are beneficial for the overall tracking process and derives an efficient closedform solution with a sharp reduction in computation complexity.
Abstract: Robust visual tracking is a challenging task in computer vision. Due to the accumulation and propagation of estimation error, model drifting often occurs and degrades the tracking performance. To mitigate this problem, in this paper we propose a novel tracking method called Recurrently Target-attending Tracking (RTT). RTT attempts to identify and exploit those reliable parts which are beneficial for the overall tracking process. To bypass occlusion and discover reliable components, multi-directional Recurrent Neural Networks (RNNs) are employed in RTT to capture long-range contextual cues by traversing a candidate spatial region from multiple directions. The produced confidence maps from the RNNs are employed to adaptively regularize the learning of discriminative correlation filters by suppressing clutter background noises while making full use of the information from reliable parts. To solve the weighted correlation filters, we especially derive an efficient closedform solution with a sharp reduction in computation complexity. Extensive experiments demonstrate that our proposed RTT is more competitive over those correlation filter based methods.

154 citations


Journal ArticleDOI
TL;DR: This work systematically map pupil foreshortening error (PFE) using an artificial eye model and then applies a geometric model correction to correct it.
Abstract: Pupil size is correlated with a wide variety of important cognitive variables and is increasingly being used by cognitive scientists. Pupil data can be recorded inexpensively and non-invasively by many commonly used video-based eye-tracking cameras. Despite the relative ease of data collection and increasing prevalence of pupil data in the cognitive literature, researchers often underestimate the methodological challenges associated with controlling for confounds that can result in misinterpretation of their data. One serious confound that is often not properly controlled is pupil foreshortening error (PFE)—the foreshortening of the pupil image as the eye rotates away from the camera. Here we systematically map PFE using an artificial eye model and then apply a geometric model correction. Three artificial eyes with different fixed pupil sizes were used to systematically measure changes in pupil size as a function of gaze position with a desktop EyeLink 1000 tracker. A grid-based map of pupil measurements was recorded with each artificial eye across three experimental layouts of the eye-tracking camera and display. Large, systematic deviations in pupil size were observed across all nine maps. The measured PFE was corrected by a geometric model that expressed the foreshortening of the pupil area as a function of the cosine of the angle between the eye-to-camera axis and the eye-to-stimulus axis. The model reduced the root mean squared error of pupil measurements by 82.5 % when the model parameters were pre-set to the physical layout dimensions, and by 97.5 % when they were optimized to fit the empirical error surface.

140 citations


Journal ArticleDOI
11 Nov 2016
TL;DR: This work introduces a novel system for HMD users to control a digital avatar in real-time while producing plausible speech animation and emotional expressions and demonstrates the quality of the system on a variety of subjects and evaluates its performance against state-of-the-art real- time facial tracking techniques.
Abstract: Significant challenges currently prohibit expressive interaction in virtual reality (VR). Occlusions introduced by head-mounted displays (HMDs) make existing facial tracking techniques intractable, and even state-of-the-art techniques used for real-time facial tracking in unconstrained environments fail to capture subtle details of the user's facial expressions that are essential for compelling speech animation. We introduce a novel system for HMD users to control a digital avatar in real-time while producing plausible speech animation and emotional expressions. Using a monocular camera attached to an HMD, we record multiple subjects performing various facial expressions and speaking several phonetically-balanced sentences. These images are used with artist-generated animation data corresponding to these sequences to train a convolutional neural network (CNN) to regress images of a user's mouth region to the parameters that control a digital avatar. To make training this system more tractable, we use audio-based alignment techniques to map images of multiple users making the same utterance to the corresponding animation parameters. We demonstrate that this approach is also feasible for tracking the expressions around the user's eye region with an internal infrared (IR) camera, thereby enabling full facial tracking. This system requires no user-specific calibration, uses easily obtainable consumer hardware, and produces high-quality animations of speech and emotional expressions. Finally, we demonstrate the quality of our system on a variety of subjects and evaluate its performance against state-of-the-art real-time facial tracking techniques.

Journal ArticleDOI
TL;DR: A stacked denoising autoencoder model is developed to learn robust, representative features from raw image data under an unsupervised manner and achieves both contrast inference and contrast integration simultaneously.
Abstract: Saliency detection models aiming to quantitatively predict human eye-attended locations in the visual field have been receiving increasing research interest in recent years. Unlike traditional methods that rely on hand-designed features and contrast inference mechanisms, this paper proposes a novel framework to learn saliency detection models from raw image data using deep networks. The proposed framework mainly consists of two learning stages. At the first learning stage, we develop a stacked denoising autoencoder (SDAE) model to learn robust, representative features from raw image data under an unsupervised manner. The second learning stage aims to jointly learn optimal mechanisms to capture the intrinsic mutual patterns as the feature contrast and to integrate them for final saliency prediction. Given the input of pairs of a center patch and its surrounding patches represented by the features learned at the first stage, a SDAE network is trained under the supervision of eye fixation labels, which achieves both contrast inference and contrast integration simultaneously. Experiments on three publically available eye tracking benchmarks and the comparisons with 16 state-of-the-art approaches demonstrate the effectiveness of the proposed framework.

Journal ArticleDOI
09 Dec 2016-PLOS ONE
TL;DR: It is shown that it is possible to identify 9–10 year old individuals at risk of persistent reading difficulties by using eye tracking during reading to probe the processes that underlie reading ability and that eye movements in reading can be highly predictive of individual reading ability.
Abstract: Dyslexia is a neurodevelopmental reading disability estimated to affect 5-10% of the population. While there is yet no full understanding of the cause of dyslexia, or agreement on its precise definition, it is certain that many individuals suffer persistent problems in learning to read for no apparent reason. Although it is generally agreed that early intervention is the best form of support for children with dyslexia, there is still a lack of efficient and objective means to help identify those at risk during the early years of school. Here we show that it is possible to identify 9-10 year old individuals at risk of persistent reading difficulties by using eye tracking during reading to probe the processes that underlie reading ability. In contrast to current screening methods, which rely on oral or written tests, eye tracking does not depend on the subject to produce some overt verbal response and thus provides a natural means to objectively assess the reading process as it unfolds in real-time. Our study is based on a sample of 97 high-risk subjects with early identified word decoding difficulties and a control group of 88 low-risk subjects. These subjects were selected from a larger population of 2165 school children attending second grade. Using predictive modeling and statistical resampling techniques, we develop classification models from eye tracking records less than one minute in duration and show that the models are able to differentiate high-risk subjects from low-risk subjects with high accuracy. Although dyslexia is fundamentally a language-based learning disability, our results suggest that eye movements in reading can be highly predictive of individual reading ability and that eye tracking can be an efficient means to identify children at risk of long-term reading difficulties.

Journal ArticleDOI
01 Nov 2016
TL;DR: The experimental results show that the algorithm ElSe (Fuhl et al. 2016) outperforms other pupil detection methods by a large margin, offering thus robust and accurate pupil positions on challenging everyday eye images.
Abstract: Robust and accurate detection of the pupil position is a key building block for head-mounted eye tracking and prerequisite for applications on top, such as gaze-based human---computer interaction or attention analysis. Despite a large body of work, detecting the pupil in images recorded under real-world conditions is challenging given significant variability in the eye appearance (e.g., illumination, reflections, occlusions, etc.), individual differences in eye physiology, as well as other sources of noise, such as contact lenses or make-up. In this paper we review six state-of-the-art pupil detection methods, namely ElSe (Fuhl et al. in Proceedings of the ninth biennial ACM symposium on eye tracking research & applications, ACM. New York, NY, USA, pp 123---130, 2016), ExCuSe (Fuhl et al. in Computer analysis of images and patterns. Springer, New York, pp 39---51, 2015), Pupil Labs (Kassner et al. in Adjunct proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing (UbiComp), pp 1151---1160, 2014. doi:10.1145/2638728.2641695), SET (Javadi et al. in Front Neuroeng 8, 2015), Starburst (Li et al. in Computer vision and pattern recognition-workshops, 2005. IEEE Computer society conference on CVPR workshops. IEEE, pp 79---79, 2005), and źwirski (źwirski et al. in Proceedings of the symposium on eye tracking research and applications (ETRA). ACM, pp 173---176, 2012. doi:10.1145/2168556.2168585). We compare their performance on a large-scale data set consisting of 225,569 annotated eye images taken from four publicly available data sets. Our experimental results show that the algorithm ElSe (Fuhl et al. 2016) outperforms other pupil detection methods by a large margin, offering thus robust and accurate pupil positions on challenging everyday eye images.

Journal ArticleDOI
TL;DR: This work proposes a multi-view correlation tracker that combines features from distinct views to do tracking via correlation filters and raises a simple but effective scale-variation detection mechanism, which strengthens the stability of scale variation tracking.
Abstract: The first contribution is proposing to combine features from distinct views to do tracking via correlation filters. The fusion method is induced by minimizing the Kullback-Leibler (KL) divergence under a probabilistic framework.The second contribution is proposing a simple and effective scale evaluation model. Robustness and efficiency are the two main goals of existing trackers. Most robust trackers are implemented with combined features or models accompanied with a high computational cost. To achieve a robust and efficient tracking performance, we propose a multi-view correlation tracker to do tracking. On one hand, the robustness of the tracker is enhanced by the multi-view model, which fuses several features and selects the more discriminative features to do tracking. On the other hand, the correlation filter framework provides a fast training and efficient target locating. The multiple features are well fused on the model level of correlation filer, which are effective and efficient. In addition, we raise a simple but effective scale-variation detection mechanism, which strengthens the stability of scale variation tracking. We evaluate our tracker on online tracking benchmark (OTB) and two visual object tracking benchmarks (VOT2014, VOT2015). These three datasets contains more than 100 video sequences in total. On all the three datasets, the proposed approach achieves promising performance.

Journal ArticleDOI
TL;DR: A proposed system extracts facial features and classifies their spatial configuration into six regions in real time and achieves an average accuracy of 91.4 percent at an average decision rate of 11 Hz on a dataset of 50 drivers from an on-road study.
Abstract: Automated estimation of the allocation of a driver's visual attention could be a critical component of future advanced driver assistance systems. In theory, vision-based tracking of the eye can provide a good estimate of gaze location. But in practice, eye tracking from video is challenging because of sunglasses, eyeglass reflections, lighting conditions, occlusions, motion blur, and other factors. Estimation of head pose, on the other hand, is robust to many of these effects but can't provide as fine-grained of a resolution in localizing the gaze. For the purpose of keeping the driver safe, it's sufficient to partition gaze into regions. In this effort, a proposed system extracts facial features and classifies their spatial configuration into six regions in real time. The proposed method achieves an average accuracy of 91.4 percent at an average decision rate of 11 Hz on a dataset of 50 drivers from an on-road study.

Journal ArticleDOI
TL;DR: The automatically detected mind wandering rate correlated negatively with measures of learning and transfer even after controlling for prior knowledge, thereby providing evidence of predictive validity.
Abstract: Mind wandering is a ubiquitous phenomenon where attention involuntarily shifts from task-related thoughts to internal task-unrelated thoughts. Mind wandering can have negative effects on performance; hence, intelligent interfaces that detect mind wandering can improve performance by intervening and restoring attention to the current task. We investigated the use of eye gaze and contextual cues to automatically detect mind wandering during reading with a computer interface. Participants were pseudorandomly probed to report mind wandering while an eye tracker recorded their gaze during the reading task. Supervised machine learning techniques detected positive responses to mind wandering probes from eye gaze and context features in a user-independent fashion. Mind wandering was detected with an accuracy of 72 % (expected accuracy by chance was 60 %) when probed at the end of a page and an accuracy of 67 % (chance was 59 %) when probed in the midst of reading a page. Global gaze features (gaze patterns independent of content, such as fixation durations) were more effective than content-specific local gaze features. An analysis of the features revealed diagnostic patterns of eye gaze behavior during mind wandering: (1) certain types of fixations were longer; (2) reading times were longer than expected; (3) more words were skipped; and (4) there was a larger variability in pupil diameter. Finally, the automatically detected mind wandering rate correlated negatively with measures of learning and transfer even after controlling for prior knowledge, thereby providing evidence of predictive validity. Possible improvements to the detector and applications that utilize the detector are discussed.

Journal ArticleDOI
TL;DR: This book is an instruction manual for the would-be researcher, offering answers to a host of critical technical questions about eye movement recording, and is well worth the modest purchase price.
Abstract: Fifty years ago it was virtually impossible to carry out funded work in the United Kingdom on eye movement control in reading. For example, the Social Science Research Council at that time, possibl...

Journal ArticleDOI
TL;DR: A complete statistical model of the eye movements was built and found very little systematic variation in eye movements over the time course of a choice or across the different choices, inconsistent with prospect theory, the priority heuristic, or decision field theory.
Abstract: We asked participants to make simple risky choices while we recorded their eye movements. We built a complete statistical model of the eye movements and found very little systematic variation in eye movements over the time course of a choice or across the different choices. The only exceptions were finding more (of the same) eye movements when choice options were similar, and an emerging gaze bias in which people looked more at the gamble they ultimately chose. These findings are inconsistent with prospect theory, the priority heuristic, or decision field theory. However, the eye movements made during a choice have a large relationship with the final choice, and this is mostly independent from the contribution of the actual attribute values in the choice options. That is, eye movements tell us not just about the processing of attribute values but also are independently associated with choice. The pattern is simple—people choose the gamble they look at more often, independently of the actual numbers they see—and this pattern is simpler than predicted by decision field theory, decision by sampling, and the parallel constraint satisfaction model.

Journal ArticleDOI
TL;DR: Whether AOI size influences the measurement of object attention and conclusions drawn about cognitive processes is tested and a guideline for the use of AOIs in behavioral eye-tracking research is concluded.
Abstract: Decision researchers frequently analyze attention to individual objects to test hypotheses about underlying cognitive processes. Generally, fixations are assigned to objects using a method known as area of interest (AOI). Ideally, an AOI includes all fixations belonging to an object while fixations to other objects are excluded. Unfortunately, due to measurement inaccuracy and insufficient distance between objects, the distributions of fixations to objects may overlap, resulting in a signal detection problem. If the AOI is to include all fixations to an object, it will also likely include fixations belonging to other objects (false positives). In a survey, we find that many researchers report testing multiple AOI sizes when performing analyses, presumably trying to balance the proportion of true and false positive fixations. To test whether AOI size influences the measurement of object attention and conclusions drawn about cognitive processes, we reanalyze four published studies and conduct a fifth tailored to our purpose. We find that in studies in which we expected overlapping fixation distributions, analyses benefited from smaller AOI sizes (0° visual angle margin). In studies where we expected no overlap, analyses benefited from larger AOI sizes (>.5° visual angle margins). We conclude with a guideline for the use of AOIs in behavioral eye-tracking research.

Journal ArticleDOI
TL;DR: The gender of both the participant and the person being observed are the factors that most influence gaze patterns during face exploration, and it is demonstrated that female gazers follow a much more exploratory scanning strategy than males watching videos of another person.
Abstract: The human face is central to our everyday social interactions. Recent studies have shown that while gazing at faces, each one of us has a particular eye-scanning pattern, highly stable across time. Although variables such as culture or personality have been shown to modulate gaze behavior, we still don't know what shapes these idiosyncrasies. Moreover, most previous observations rely on static analyses of small-sized eye-position data sets averaged across time. Here, we probe the temporal dynamics of gaze to explore what information can be extracted about the observers and what is being observed. Controlling for any stimuli effect, we demonstrate that among many individual characteristics, the gender of both the participant (gazer) and the person being observed (actor) are the factors that most influence gaze patterns during face exploration. We record and exploit the largest set of eye-tracking data (405 participants, 58 nationalities) from participants watching videos of another person. Using novel data-mining techniques, we show that female gazers follow a much more exploratory scanning strategy than males. Moreover, female gazers watching female actresses look more at the eye on the left side. These results have strong implications in every field using gaze-based models from computer vision to clinical psychology.

Proceedings ArticleDOI
24 Oct 2016
TL;DR: It is argued that eye tracking should be considered as a valuable instrument to analyze cognitive processes in visual computing and suggest future research directions to tackle outstanding issues.
Abstract: In this position paper we encourage the use of eye tracking measurements to investigate users' cognitive load while interacting with a system. We start with an overview of how eye movements can be interpreted to provide insight about cognitive processes and present a descriptive model representing the relations of eye movements and cognitive load. Then, we discuss how specific characteristics of human-computer interaction (HCI) interfere with the model and impede the application of eye tracking data to measure cognitive load in visual computing. As a result, we present a refined model, embedding the characteristics of HCI into the relation of eye tracking data and cognitive load. Based on this, we argue that eye tracking should be considered as a valuable instrument to analyze cognitive processes in visual computing and suggest future research directions to tackle outstanding issues.

Journal ArticleDOI
TL;DR: A novel system that enables a person with motor disability to control a wheelchair via eye-gaze and to provide a continuous, real-time navigation in unknown environments is proposed.
Abstract: Thanks to advances in electric wheelchair design, persons with motor impairments due to diseases, such as amyotrophic lateral sclerosis (ALS), have tools to become more independent and mobile. However, an electric wheelchair generally requires considerable skill to learn how to use and operate. Moreover, some persons with motor disabilities cannot drive an electric wheelchair manually (even with a joystick), because they lack the physical ability to control their hand movement (such is the case with people with ALS). In this paper, we propose a novel system that enables a person with motor disability to control a wheelchair via eye-gaze and to provide a continuous, real-time navigation in unknown environments. The system comprises a Permobile M400 wheelchair, eye tracking glasses, a depth camera to capture the geometry of the ambient space, a set of ultrasound and infrared sensors to detect obstacles with low proximity that are out of the field of view for the depth camera, a laptop placed on a flexible mount for maximized comfort, and a safety off switch to turn off the system whenever needed. First, a novel algorithm is proposed to support continuous, real-time target identification, path planning, and navigation in unknown environments. Second, the system utilizes a novel N-cell grid-based graphical user interface that adapts to input/output interfaces specifications. Third, a calibration method for the eye tracking system is implemented to minimize the calibration overheads. A case study with a person with ALS is presented, and interesting findings are discussed. The participant showed improved performance in terms of calibration time, task completion time, and navigation speed for a navigation trips between office, dining room, and bedroom. Furthermore, debriefing the caregiver has also shown promising results: the participant enjoyed higher level of confidence driving the wheelchair and experienced no collisions through all the experiment.

Proceedings ArticleDOI
14 Mar 2016
TL;DR: ElSe, a novel algorithm based on ellipse evaluation of a filtered edge image that can be integrated in embedded architectures, e.g., driving, is proposed and evaluated against four state-of-the-art methods.
Abstract: Fast and robust pupil detection is an essential prerequisite for video-based eye-tracking in real-world settings. Several algorithms for image-based pupil detection have been proposed in the past, their applicability, however, is mostly limited to laboratory conditions. In real-world scenarios, automated pupil detection has to face various challenges, such as illumination changes, reflections (on glasses), make-up, non-centered eye recording, and physiological eye characteristics. We propose ElSe, a novel algorithm based on ellipse evaluation of a filtered edge image. We aim at a robust, inexpensive approach that can be integrated in embedded architectures, e.g., driving. The proposed algorithm was evaluated against four state-of-the-art methods on over 93,000 hand-labeled images from which 55,000 are new eye images contributed by this work. On average, the proposed method achieved a 14.53% improvement on the detection rate relative to the best state-of-the-art performer. Algorithm and data sets are available for download: ftp://emmapupildata@messor.informatik.uni-tuebingen.de (password:eyedata).

Proceedings ArticleDOI
01 Dec 2016
TL;DR: Zhang et al. as discussed by the authors proposed a tracking-by-detection framework by fusing hand-crafted, deep RGB, and deep motion features to improve visual tracking performance.
Abstract: Robust visual tracking is a challenging computer vision problem, with many real-world applications. Most existing approaches employ hand-crafted appearance features, such as HOG or Color Names. Recently, deep RGB features extracted from convolutional neural networks have been successfully applied for tracking. Despite their success, these features only capture appearance information. On the other hand, motion cues provide discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. This paper presents an investigation of the impact of deep motion features in a tracking-by-detection framework. We further show that hand-crafted, deep RGB, and deep motion features contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly suggest that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.

Journal ArticleDOI
TL;DR: The presence of systematic changes in pupil size was confirmed, again at both small and large scales, and its tight relationship with gaze position estimates when observers were engaged in a demanding visual discrimination task.

Patent
26 Sep 2016
TL;DR: In this article, a method for mapping an input device to a virtual object in virtual space displayed on a display device is described, based on the user's gaze direction being directed to the virtual object.
Abstract: A method for mapping an input device to a virtual object in virtual space displayed on a display device is disclosed. The method may include determining, via an eye tracking device, a gaze direction of a user. The method may also include, based at least in part on the gaze direction being directed to a virtual object in virtual space displayed on a display device, modifying an action to be taken by one or more processors in response to receiving a first input from an input device. The method may further include, thereafter, in response to receiving the input from the input device, causing the action to occur, wherein the action correlates the first input to an interaction with the virtual object.

Journal ArticleDOI
TL;DR: An update of the studyforrest dataset is presented that complements the previously released functional magnetic resonance imaging data for natural language processing with a new two-hour 3 Tesla fMRI acquisition while 15 of the original participants were shown an audio-visual version of the stimulus motion picture.
Abstract: Here we present an update of the studyforrest (http://studyforrest.org) dataset that complements the previously released functional magnetic resonance imaging (fMRI) data for natural language processing with a new two-hour 3 Tesla fMRI acquisition while 15 of the original participants were shown an audio-visual version of the stimulus motion picture. We demonstrate with two validation analyses that these new data support modeling specific properties of the complex natural stimulus, as well as a substantial within-subject BOLD response congruency in brain areas related to the processing of auditory inputs, speech, and narrative when compared to the existing fMRI data for audio-only stimulation. In addition, we provide participants' eye gaze location as recorded simultaneously with fMRI, and an additional sample of 15 control participants whose eye gaze trajectories for the entire movie were recorded in a lab setting-to enable studies on attentional processes and comparative investigations on the potential impact of the stimulation setting on these processes.

Journal ArticleDOI
TL;DR: By resolving some of the methodological problems involved, this work aims to facilitate the transition from the traditional stimulus-response paradigm to the study of visual perception in more naturalistic conditions.

Journal ArticleDOI
TL;DR: In this paper, a review of eye-tracking in applied linguistics and second language research is presented, focusing on what eye tracking can and cannot be used for, and guidelines for designing sound research studies using the technology.
Abstract: With eye-tracking technology the eye is thought to give researchers a window into the mind. Importantly, eye-tracking has significant advantages over traditional online processing measures: chiefly that it allows for more ‘natural’ processing as it does not require a secondary task, and that it provides a very rich moment-to-moment data source. In recognition of the technology’s benefits, an ever increasing number of researchers in applied linguistics and second language research are beginning to use it. As eye-tracking gains traction in the field, it is important to ensure that it is established in an empirically sound fashion. To do this it is important for the field to come to an understanding about what eye-tracking is, what eye-tracking measures tell us, what it can be used for, and what different eye-tracking systems can and cannot do. Further, it is important to establish guidelines for designing sound research studies using the technology. The goal of the current review is to begin to address these issues.