scispace - formally typeset
Search or ask a question

Showing papers on "Eye tracking published in 2022"


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a tracking framework combining correlation filter tracking and Siamese-based object tracking, which achieved state-of-the-art performance in the TC128 dataset.

73 citations


Journal ArticleDOI
TL;DR: In this paper , the authors systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics, and extensively evaluate and analyzes the leading visual tracking algorithms.
Abstract: Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years -- predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the current DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from nine key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, exploitation of correlation filter advantages, aerial-view tracking, long-term tracking, and online tracking. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, LaSOT, UAV123, UAVDT, and VisDrone2019. Finally, by conducting critical analyses of these state-of-the-art trackers quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.

70 citations


Journal ArticleDOI
TL;DR: In this article , the authors present a review of how the various aspects of any study using an eye tracker (such as the instrument, methodology, environment, participant, etc.) affect the quality of the recorded eye-tracking data and the obtained eye-movement and gaze measures.
Abstract: Abstract In this paper, we present a review of how the various aspects of any study using an eye tracker (such as the instrument, methodology, environment, participant, etc.) affect the quality of the recorded eye-tracking data and the obtained eye-movement and gaze measures. We take this review to represent the empirical foundation for reporting guidelines of any study involving an eye tracker. We compare this empirical foundation to five existing reporting guidelines and to a database of 207 published eye-tracking studies. We find that reporting guidelines vary substantially and do not match with actual reporting practices. We end by deriving a minimal, flexible reporting guideline based on empirical research (Section “An empirically based minimal reporting guideline”).

53 citations


Journal ArticleDOI
Zhongxu Hu1, Chen Lv1, Peng Hang1, Chao Huang1, Yang Xing1 
TL;DR: This article proposes a more reasonable and feasible method based on a dual-view scene with calibration-free gaze direction that is feasible and better than the state-of-the-art methods based on multiple widely used metrics.
Abstract: Driver attention estimation is one of the key technologies for intelligent vehicles The existing related methods only focus on the scene image or the driver's gaze or head pose The purpose of this article is to propose a more reasonable and feasible method based on a dual-view scene with calibration-free gaze direction According to human visual mechanisms, the low-level features, static visual saliency map, and dynamic optical flow information are extracted as input feature maps, which combine the high-level semantic descriptions and a gaze probability map transformed from the gaze direction A multiresolution neural network is proposed to handle the calibration-free features The proposed method is verified on a virtual reality experimental platform that collected more than 550 000 samples and obtained a more accurate ground truth The experiments show that the proposed method is feasible and better than the state-of-the-art methods based on multiple widely used metrics This study also provides a discussion of the effects of different landscapes, times, and weather conditions on the performance

48 citations


Journal ArticleDOI
TL;DR: A comprehensive overview of state-of-the-art tracking frameworks including both deep and non-deep trackers is provided in this article , where the authors present both quantitative and qualitative tracking results of various trackers on five benchmark datasets.

39 citations


Journal ArticleDOI
TL;DR: A review of gaze interaction and eye tracking research related to Extended Reality (XR) can be found in this article , where the authors outline efforts to apply eye gaze for direct interaction with virtual content and design of attentive interfaces that adapt the presented content based on eye gaze behavior.
Abstract: With innovations in the field of gaze and eye tracking, a new concentration of research in the area of gaze-tracked systems and user interfaces has formed in the field of Extended Reality (XR). Eye trackers are being used to explore novel forms of spatial human–computer interaction, to understand human attention and behavior, and to test expectations and human responses. In this article, we review gaze interaction and eye tracking research related to XR that has been published since 1985, which includes a total of 215 publications. We outline efforts to apply eye gaze for direct interaction with virtual content and design of attentive interfaces that adapt the presented content based on eye gaze behavior and discuss how eye gaze has been utilized to improve collaboration in XR. We outline trends and novel directions and discuss representative high-impact papers in detail.

24 citations


Journal ArticleDOI
11 Jan 2022
TL;DR: The main oculomotor events studied in the literature are described, and their characteristics exploited by different measures, as well as the benefits and practical challenges involving these measures and recommendations on future eye-tracking research directions.
Abstract: Our subjective visual experiences involve complex interaction between our eyes, our brain, and the surrounding world. It gives us the sense of sight, color, stereopsis, distance, pattern recognition, motor coordination, and more. The increasing ubiquity of gaze-aware technology brings with it the ability to track gaze and pupil measures with varying degrees of fidelity. With this in mind, a review that considers the various gaze measures becomes increasingly relevant, especially considering our ability to make sense of these signals given different spatio-temporal sampling capacities. In this paper, we selectively review prior work on eye movements and pupil measures. We first describe the main oculomotor events studied in the literature, and their characteristics exploited by different measures. Next, we review various eye movement and pupil measures from prior literature. Finally, we discuss our observations based on applications of these measures, the benefits and practical challenges involving these measures, and our recommendations on future eye-tracking research directions.

22 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper examined the effect of celebrity endorsement on tourist response and the moderating role of regulatory focus on the celebrity endorsement effect. But, they did not consider the effects of endorsement by a local celebrity vs a celebrity from tourist source countries.
Abstract: • Examines the effectiveness of different types of celebrity endorsement. • Combines eye-tracking techniques with self-report method. • A source celebrity is more effective in eliciting tourist responses. • Examines a serial mediation model underlying the main effect. • Examines the moderating role of regulatory focus. Celebrity endorsement has become ubiquitous in international destination marketing, but studies have rarely assessed the differences between the effects of endorsement by a local celebrity vs. a celebrity from tourist source countries (source celebrity). To bridge the literature gap, this paper draws on meaning transfer theory and match-up hypothesis to examine the effectiveness of these two types of celebrity endorsement by exploring the underlying mechanism. By carrying out a pretest and three experiments, the eye-tracking results and lab experiments show that international destination marketing involving a source celebrity (vs. local celebrity) can significantly increase tourists’ visual attention to the advertising destination scenery, positive attitudes toward the destination and intentions to visit. Furthermore, a serial mediation model of celebrity endorsement → celebrity–tourist congruency → tourist attitude toward celebrity → tourist response is also identified. Finally, this paper empirically examines the moderating role of regulatory focus on the celebrity endorsement effect.

18 citations


Journal ArticleDOI
TL;DR: A comprehensive survey of the single-user and multi-user gaze estimation approaches with deep learning is presented in this article , where state-of-the-art approaches are analyzed based on deep learning model architectures, coordinate systems, environmental constraints, datasets and performance evaluation metrics.
Abstract: Human gaze estimation plays a major role in many applications in human–computer interaction and computer vision by identifying the users’ point-of-interest. Revolutionary developments of deep learning have captured significant attention in gaze estimation literature. Gaze estimation techniques have progressed from single-user constrained environments to multi-user unconstrained environments with the applicability of deep learning techniques in complex unconstrained environments with extensive variations. This paper presents a comprehensive survey of the single-user and multi-user gaze estimation approaches with deep learning. State-of-the-art approaches are analyzed based on deep learning model architectures, coordinate systems, environmental constraints, datasets and performance evaluation metrics. A key outcome from this survey realizes the limitations, challenges and future directions of multi-user gaze estimation techniques. Furthermore, this paper serves as a reference point and a guideline for future multi-user gaze estimation research.

18 citations


Journal ArticleDOI
TL;DR: In this paper , the authors present a systematic and thorough review of more than 90 discriminative correlation filters (DCFs) and deep Siamese networks (SNs) based on results in nine tracking benchmarks.
Abstract: Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these tracking paradigms. Furthermore, we thoroughly analyze the performance of DCF and Siamese trackers on nine benchmarks, covering different experimental aspects of visual tracking: datasets, evaluation metrics, performance, and speed comparisons. We finish the survey by presenting recommendations and suggestions for distinguished open challenges based on our analysis.

17 citations


Journal ArticleDOI
TL;DR: In this article , a distilled Siamese tracking framework is proposed to learn small, fast and accurate trackers (students), which capture critical knowledge from large siamese trackers by a teacher-students knowledge distillation model.
Abstract: In recent years, Siamese network based trackers have significantly advanced the state-of-the-art in real-time tracking. Despite their success, Siamese trackers tend to suffer from high memory costs, which restrict their applicability to mobile devices with tight memory budgets. To address this issue, we propose a distilled Siamese tracking framework to learn small, fast and accurate trackers (students), which capture critical knowledge from large Siamese trackers (teachers) by a teacher-students knowledge distillation model. This model is intuitively inspired by the one teacher versus multiple students learning method typically employed in schools. In particular, our model contains a single teacher-student distillation module and a student-student knowledge sharing mechanism. The former is designed using a tracking-specific distillation strategy to transfer knowledge from a teacher to students. The latter is utilized for mutual learning between students to enable in-depth knowledge understanding. Extensive empirical evaluations on several popular Siamese trackers demonstrate the generality and effectiveness of our framework. Moreover, the results on five tracking benchmarks show that the proposed distilled trackers achieve compression rates of up to 18× and frame-rates of 265 FPS, while obtaining comparable tracking accuracy compared to base models.

Journal ArticleDOI
TL;DR: In this article, the impact of different AR modalities in terms of information mode and interaction modality (i.e. video vs. 3D animation) on user performance, workload, eye gaze behaviours, and usability during a maintenance assembly task was investigated.

Journal ArticleDOI
TL;DR: This work proposes a novel object-uncertainty policy, inspired by the Siamese trackers, that can be embedded into DCF-like methods to improve tracking performance and is capable of preventing the model from learning the background information.

Journal ArticleDOI
TL;DR: In this article , the impact of different AR modalities in terms of information mode and interaction modality (i.e. video vs. 3D animation) on user performance, workload, eye gaze behaviours, and usability during a maintenance assembly task was investigated.

Journal ArticleDOI
TL;DR: Results confirm that CADe systems detect polyps faster than humans, however, use of CADe did not improve human reaction times and increased misinterpretation of normal mucosa and decreased the eye travel distance.
Abstract: Background Multiple computer-aided systems for polyp detection (CADe) have been introduced into clinical practice, with an unclear effect on examiner behavior. This study aimed to measure the influence of a CADe system on reaction time, mucosa misinterpretation, and changes in visual gaze pattern. Methods Participants with variable levels of colonoscopy experience viewed video sequences (n = 29) while eye movement was tracked. Using a crossover design, videos were presented in two assessments, with and without CADe support. Reaction time for polyp detection and eye-tracking metrics were evaluated. Results 21 participants performed 1218 experiments. CADe was significantly faster in detecting polyps compared with participants (median 1.16 seconds [99 %CI 0.40–3.43] vs. 2.97 seconds [99 %CI 2.53–3.77], respectively). However, the reaction time of participants when using CADe (median 2.90 seconds [99 %CI 2.55–3.38]) was similar to that without CADe. CADe increased misinterpretation of normal mucosa and reduced the eye travel distance. Conclusions Results confirm that CADe systems detect polyps faster than humans. However, use of CADe did not improve human reaction times. It increased misinterpretation of normal mucosa and decreased the eye travel distance. Possible consequences of these findings might be prolonged examination time and deskilling.

Journal ArticleDOI
TL;DR: In this article , the effects of mask wearing on infant social development were investigated in the context of face processing in early infancy, focusing on identifying sensitive periods during which being exposed to specific facial features or to the entire face configuration has been found to be important for the development of perceptive and socio-communicative skills.
Abstract: Human faces are one of the most prominent stimuli in the visual environment of young infants and convey critical information for the development of social cognition. During the COVID-19 pandemic, mask wearing has become a common practice outside the home environment. With masks covering nose and mouth regions, the facial cues available to the infant are impoverished. The impact of these changes on development is unknown but is critical to debates around mask mandates in early childhood settings. As infants grow, they increasingly interact with a broader range of familiar and unfamiliar people outside the home; in these settings, mask wearing could possibly influence social development. In order to generate hypotheses about the effects of mask wearing on infant social development, in the present work, we systematically review N = 129 studies selected based on the most recent PRISMA guidelines providing a state-of-the-art framework of behavioral studies investigating face processing in early infancy. We focused on identifying sensitive periods during which being exposed to specific facial features or to the entire face configuration has been found to be important for the development of perceptive and socio-communicative skills. For perceptive skills, infants gradually learn to analyze the eyes or the gaze direction within the context of the entire face configuration. This contributes to identity recognition as well as emotional expression discrimination. For socio-communicative skills, direct gaze and emotional facial expressions are crucial for attention engagement while eye-gaze cuing is important for joint attention. Moreover, attention to the mouth is particularly relevant for speech learning. We discuss possible implications of the exposure to masked faces for developmental needs and functions. Providing groundwork for further research, we encourage the investigation of the consequences of mask wearing for infants’ perceptive and socio-communicative development, suggesting new directions within the research field.

Book ChapterDOI
TL;DR: In this paper , a student-teacher transformer framework was proposed to learn from radiologists' visual search patterns, encoded as human visual attention regions, in a cascaded global-focal transformer framework.
Abstract: In this work, we present RadioTransformer, a novel student-teacher transformer framework, that leverages radiologists’ gaze patterns and models their visuo-cognitive behavior for disease diagnosis on chest radiographs. Domain experts, such as radiologists, rely on visual information for medical image interpretation. On the other hand, deep neural networks have demonstrated significant promise in similar tasks even where visual interpretation is challenging. Eye-gaze tracking has been used to capture the viewing behavior of domain experts, lending insights into the complexity of visual search. However, deep learning frameworks, even those that rely on attention mechanisms, do not leverage this rich domain information for diagnostic purposes. RadioTransformerfills this critical gap by learning from radiologists’ visual search patterns, encoded as ‘human visual attention regions’ in a cascaded global-focal transformer framework. The overall ‘global’ image characteristics and the more detailed ‘local’ features are captured by the proposed global and focal modules, respectively. We experimentally validate the efficacy of RadioTransformeron 8 datasets involving different disease classification tasks where eye-gaze data is not available during the inference phase. Code: https://github.com/bmi-imaginelab/radiotransformer

Journal ArticleDOI
TL;DR: In this paper , the authors evaluated the temporal dynamics of distraction via eye-tracking measures in a VR classroom setting with 20 children diagnosed with ADHD between 8 and 12 years of age and found that while children did not always look at distractors themselves for long periods of time, the presence of a distractor disrupted on-task gaze at task-relevant whiteboard stimuli and lowered rates of task performance.
Abstract: Objective: Distractions inordinately impair attention in children with Attention-Deficit Hyperactivity Disorder (ADHD) but examining this behavior under real-life conditions poses a challenge for researchers and clinicians. Virtual reality (VR) technologies may mitigate the limitations of traditional laboratory methods by providing a more ecologically relevant experience. The use of eye-tracking measures to assess attentional functioning in a VR context in ADHD is novel. In this proof of principle project, we evaluate the temporal dynamics of distraction via eye-tracking measures in a VR classroom setting with 20 children diagnosed with ADHD between 8 and 12 years of age. Method: We recorded continuous eye movements while participants performed math, Stroop, and continuous performance test (CPT) tasks with a series of “real-world” classroom distractors presented. We analyzed the impact of the distractors on rates of on-task performance and on-task, eye-gaze (i.e., looking at a classroom whiteboard) versus off-task eye-gaze (i.e., looking away from the whiteboard). Results: We found that while children did not always look at distractors themselves for long periods of time, the presence of a distractor disrupted on-task gaze at task-relevant whiteboard stimuli and lowered rates of task performance. This suggests that children with attention deficits may have a hard time returning to tasks once those tasks are interrupted, even if the distractor itself does not hold attention. Eye-tracking measures within the VR context can reveal rich information about attentional disruption. Conclusions: Leveraging virtual reality technology in combination with eye-tracking measures is well-suited to advance the understanding of mechanisms underlying attentional impairment in naturalistic settings. Assessment within these immersive and well-controlled simulated environments provides new options for increasing our understanding of distractibility and its potential impact on the development of interventions for children with ADHD.

Proceedings ArticleDOI
08 Jun 2022
TL;DR: It is concluded that webcam-based eye tracking is a viable, low-cost alternative to remote eye tracking and the switch from ambient to focal attention depending on complexity of the visual stimuli.
Abstract: We compare the measurement error and validity of webcam-based eye tracking to that of a remote eye tracker as well as software integration of both. We ran a study with n = 83 participants, consisting of a point detection task and an emotional visual search task under three between-subjects experimental conditions (webcam-based, remote, and integrated). We analyzed location-based (e.g., fixations) and process-based eye tracking metrics (ambient-focal attention dynamics). Despite higher measurement error of webcam eye tracking, our results in all three experimental conditions were in line with theoretical expectations. For example, time to first fixation toward happy faces was significantly shorter than toward sad faces (the happiness-superiority effect). As expected, we also observed the switch from ambient to focal attention depending on complexity of the visual stimuli. We conclude that webcam-based eye tracking is a viable, low-cost alternative to remote eye tracking.


Journal ArticleDOI
TL;DR: In this paper , a novel object-uncertainty policy was proposed to prevent the model from learning the background information, which can be embedded into DCF-like methods to improve tracking performance.

Journal ArticleDOI
TL;DR: In this article , the authors examined if educators are valuing certain benchmarks of quality (i.e., scaffolding, feedback, curriculum, development team, learning theory) when they select educational apps from app stores and evaluated how they gather information during the selection process.
Abstract: The study examines if educators are valuing certain benchmarks of quality (i.e., scaffolding, feedback, curriculum, development team, learning theory) when they select educational apps from app stores and evaluates how they gather information during the selection process. Pre-service and working elementary educators viewed and evaluated app store pages for 10 simulated apps while gaze data (i.e., looking at either the written descriptions or app images) were collected using an eye-tracker. Participants' value-judgements were measured by their willingness to download the app, how much they would pay, their rating, and ranking, while gaze data examined participants' fixation count and fixation duration. Results from paired-samples t-tests, repeated-measures ANOVAs, and nonparametric tests indicate that educators value apps with educational benchmarks over buzzwords, while judging apps with development team, scaffolding, and curriculum higher than those with an integrated learning theory and feedback. Moreover, eye tracking results revealed that educators scrutinize app images more when they feature educational benchmarks. To improve educators' app selection, professional development should target educators’ views of learning theory and feedback as well as their use of app images as a source of information on app quality (cf., detailed text descriptions).

Journal ArticleDOI
TL;DR: In this paper , the authors investigated whether a change in the colour of the interactive elements in the eye-controlled system interface was related to search duration. And they found that participants generally believed that the feedback form of reducing brightness was very natural, and the feedback forms of converting to the contrasting colour was very clear.

Journal ArticleDOI
TL;DR: In this article , the authors summarize previous studies on the application of eye-tracking techniques to the construction safety context through a systematic literature review and provide practical suggestions for future research and on-site safety management.
Abstract: Safety is the most important concern in the construction industry, and construction workers’ attention allocation is closely associated with their hazard recognition and safety behaviors. The recent emergence of eye-tracking techniques allows researchers in construction safety to further investigate construction workers’ visual attention allocation during hazard recognition. The existing eye-tracking studies in construction safety need to be comprehensively understood, to provide practical suggestions for future research and on-site safety management. This study aims to summarize previous studies on the application of eye-tracking techniques to the construction safety context through a systematic literature review. The literature search and study selection process included 22 eligible studies. Content analysis was then carried out from participant selection, device selection, task design, area of interest determination, feature extraction, data analysis, and main findings. Major limitations of the existing studies are identified, and recommendations for future research in theoretical development, experiment improvement, and data analysis method advancement are proposed to address these limitations. Even though the application of eye-tracking techniques in construction safety research is still in its early stage, it is worth future continuous attention because relevant discoveries would be of great significance to hazard control and safety management in the construction industry.

Journal ArticleDOI
01 Mar 2022-Displays
TL;DR: In this article , the influence of dynamic images and traditional static images on user perception in web interface visualizations of products was discussed, and the results of eye movement experiment show that the visual cognitive effect was better for the dynamic images than the static image, and efficiency of visual search was improved.

Journal ArticleDOI
TL;DR: In this article , the effects of visual attention on the QoE of VR 360-degree videos were evaluated through subjective tests where participants watched degraded versions of 360-videos through a Head-Mounted Display with integrated eye-tracking sensors.
Abstract: Abstract The research domain on the Quality of Experience (QoE) of 2D video streaming has been well established. However, a new video format is emerging and gaining popularity and availability: VR 360-degree video. The processing and transmission of 360-degree videos brings along new challenges such as large bandwidth requirements and the occurrence of different distortions. The viewing experience is also substantially different from 2D video, it offers more interactive freedom on the viewing angle but can also be more demanding and cause cybersickness. The first goal of this article is to complement earlier research by Tran , et al. (2017) [39] testing the effects of quality degradation, freezing, and content on the QoE of 360-videos. The second goal is to test the contribution of visual attention as an influence factor in the QoE assessment. Data was gathered through subjective tests where participants watched degraded versions of 360-videos through a Head-Mounted Display with integrated eye-tracking sensors. After each video they answered questions regarding their quality perception, experience, perceptual load, and cybersickness. Our results showed that the participants rated the overall QoE rather low, and the ratings decreased with added degradations and freezing events. Cyber sickness was found not to be an issue. The effects of the manipulations on visual attention were minimal. Attention was mainly directed by content, but also by surprising elements. The addition of eye-tracking metrics did not further explain individual differences in subjective ratings. Nevertheless, it was found that looking at moving objects increased the negative effect of freezing events and made participants less sensitive to quality distortions. More research is needed to conclude whether visual attention is an influence factor on the QoE in 360-video.

Journal ArticleDOI
TL;DR: In this paper , the authors investigated the impact of online product reviews on consumers purchasing decisions by using eye-tracking and found that consumers' attention to negative comments was significantly greater than that to positive comments, especially for female consumers.
Abstract: This study investigated the impact of online product reviews on consumers purchasing decisions by using eye-tracking. The research methodology involved (i) development of a conceptual framework of online product review and purchasing intention through the moderation role of gender and visual attention in comments, and (ii) empirical investigation into the region of interest (ROI) analysis of consumers fixation during the purchase decision process and behavioral analysis. The results showed that consumers’ attention to negative comments was significantly greater than that to positive comments, especially for female consumers. Furthermore, the study identified a significant correlation between the visual browsing behavior of consumers and their purchase intention. It also found that consumers were not able to identify false comments. The current study provides a deep understanding of the underlying mechanism of how online reviews influence shopping behavior, reveals the effect of gender on this effect for the first time and explains it from the perspective of attentional bias, which is essential for the theory of online consumer behavior. Specifically, the different effects of consumers’ attention to negative comments seem to be moderated through gender with female consumers’ attention to negative comments being significantly greater than to positive ones. These findings suggest that practitioners need to pay particular attention to negative comments and resolve them promptly through the customization of product/service information, taking into consideration consumer characteristics, including gender.

Journal ArticleDOI
TL;DR:
Abstract: Autism spectrum disorder is a group of disorders marked by difficulties with social skills, repetitive activities, speech, and nonverbal communication. Deficits in paying attention to, and processing, social stimuli are common for children with autism spectrum disorders. It is uncertain whether eye-tracking technologies can assist in establishing an early biomarker of autism based on the children’s atypical visual preference patterns. In this study, we used machine learning methods to test the applicability of eye-tracking data in children to aid in the early screening of autism. We looked into the effectiveness of various machine learning techniques to discover the best model for predicting autism using visualized eye-tracking scan path images. We adopted three traditional machine learning models and a deep neural network classifier to run experimental trials. This study employed a publicly available dataset of 547 graphical eye-tracking scan paths from 328 typically developing and 219 autistic children. We used image augmentation to populate the dataset to prevent the model from overfitting. The deep neural network model outperformed typical machine learning approaches on the populated dataset, with 97% AUC, 93.28% sensitivity, 91.38% specificity, 94.46% NPV, and 90.06% PPV (fivefold cross-validated). The findings strongly suggest that eye-tracking data help clinicians for a quick and reliable autism screening.

Journal ArticleDOI
TL;DR: This paper proposes a workflow and software architecture that encompasses an entire experimental scenario, including virtual scene preparation and operationalization of visual stimuli, experimental data collection and considerations for ambiguous visual stimulus, post-hoc data correction, data aggregation, and visualization.
Abstract: Not all eye-tracking methodology and data processing are equal. While the use of eye-tracking is intricate because of its grounding in visual physiology, traditional 2D eye-tracking methods are supported by software, tools, and reference studies. This is not so true for eye-tracking methods applied in virtual reality (imaginary 3D environments). Previous research regarded the domain of eye-tracking in 3D virtual reality as an untamed realm with unaddressed issues. The present paper explores these issues, discusses possible solutions at a theoretical level, and offers example implementations. The paper also proposes a workflow and software architecture that encompasses an entire experimental scenario, including virtual scene preparation and operationalization of visual stimuli, experimental data collection and considerations for ambiguous visual stimuli, post-hoc data correction, data aggregation, and visualization. The paper is accompanied by examples of eye-tracking data collection and evaluation based on ongoing research of indoor evacuation behavior.

Journal ArticleDOI
TL;DR: In this paper , the authors used a mobile health app to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment, and compared the gaze fixation and visual scanning methods used by children during a 90-second gameplay video to identify statistically significant differences between the two cohorts; they then trained a long short-term memory (LSTM) neural network to determine if gaze indicators could be predictive of ASD.
Abstract: Background Autism spectrum disorder (ASD) is a widespread neurodevelopmental condition with a range of potential causes and symptoms. Standard diagnostic mechanisms for ASD, which involve lengthy parent questionnaires and clinical observation, often result in long waiting times for results. Recent advances in computer vision and mobile technology hold potential for speeding up the diagnostic process by enabling computational analysis of behavioral and social impairments from home videos. Such techniques can improve objectivity and contribute quantitatively to the diagnostic process. Objective In this work, we evaluate whether home videos collected from a game-based mobile app can be used to provide diagnostic insights into ASD. To the best of our knowledge, this is the first study attempting to identify potential social indicators of ASD from mobile phone videos without the use of eye-tracking hardware, manual annotations, and structured scenarios or clinical environments. Methods Here, we used a mobile health app to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment. We used automated data set annotations to analyze two social indicators that have previously been shown to differ between children with ASD and their neurotypical (NT) peers: (1) gaze fixation patterns, which represent regions of an individual’s visual focus and (2) visual scanning methods, which refer to the ways in which individuals scan their surrounding environment. We compared the gaze fixation and visual scanning methods used by children during a 90-second gameplay video to identify statistically significant differences between the 2 cohorts; we then trained a long short-term memory (LSTM) neural network to determine if gaze indicators could be predictive of ASD. Results Our results show that gaze fixation patterns differ between the 2 cohorts; specifically, we could identify 1 statistically significant region of fixation (P<.001). In addition, we also demonstrate that there are unique visual scanning patterns that exist for individuals with ASD when compared to NT children (P<.001). A deep learning model trained on coarse gaze fixation annotations demonstrates mild predictive power in identifying ASD. Conclusions Ultimately, our study demonstrates that heterogeneous video data sets collected from mobile devices hold potential for quantifying visual patterns and providing insights into ASD. We show the importance of automated labeling techniques in generating large-scale data sets while simultaneously preserving the privacy of participants, and we demonstrate that specific social engagement indicators associated with ASD can be identified and characterized using such data.