Proceedings of Machine Learning Research

Proceedings of Machine Learning Research Proceedings of The 1st Gaze Meets ML workshop Held in New Orleans, USA on 03 December 2022 Published as Volume 210 by the Proceedings of Machine Learning Research on 03 April 2023. Volume Edited by: Ismini Lourentzou Joy Wu Satyananda Kashyap Alexandros Karargyris Leo Anthony Celi Ban Kawas Sachin Talathi Series Editors: Neil D. Lawrence https://proceedings.mlr.press/v210/ Wed, 24 Apr 2024 08:35:50 +0000 Wed, 24 Apr 2024 08:35:50 +0000 Jekyll v3.9.5 SecNet: Semantic Eye Completion in Implicit Field If we take a depth image of an eye, noise artifacts and holes significantly affect the depth values on the eye due to the specularity of the sclera. This paper aims at solving this problem through semantic shape completion. We propose an end-to-end approach to train a neural network, called SecNet (semantic eye completion network), that predicts a point cloud with an accurate eye-geometry coupled with the semantic labels of each point. These labels correspond to the essential eye-regions, i.e. pupil, iris and sclera. Particularly, our work performs implicit estimation of the query points with semantic labels where both the semantic and occupancy predictions are trained in an end-to-end way. To evaluate the ap- proach, we then use the synthetic eye-scans rendered in UnityEyes simulator environment. Compared to the state of the art, the proposed method improves the accuracy for shape- completion for 3D eye-scan by 8.2%. In practice, we also demonstrate the application of our semantic eye completion for gaze estimation. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/wang23a.html https://proceedings.mlr.press/v210/wang23a.html Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models Eye-tracking has potential to provide rich behavioral data about human cognition in eco- logically valid environments. However, analyzing this rich data is often challenging. Most automated analyses are specific to simplistic artificial visual stimuli with well-separated, static regions of interest, while most analyses in the context of complex visual stimuli, such as most natural scenes, rely on laborious and time-consuming manual annotation. This paper studies using computer vision tools for “attention decoding”, the task of assessing the locus of a participant’s overt visual attention over time. We provide a publicly available Multiple Object Eye-Tracking (MOET) dataset, consisting of gaze data from participants tracking specific objects, annotated with labels and bounding boxes, in crowded real-world videos, for training and evaluating attention decoding algorithms. We also propose two end- to-end deep learning models for attention decoding and compare these to state-of-the-art heuristic methods. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/uppal23a.html https://proceedings.mlr.press/v210/uppal23a.html Learning to count visual objects by combining “what” and “where” in recurrent memory Counting the number of objects in a visual scene is easy for humans but challenging for modern deep neural networks. Here we explore what makes this problem hard and study the neural computations that allow transfer of counting ability to new objects and contexts. Previous work has implicated posterior parietal cortex (PPC) in numerosity perception and in visual scene understanding more broadly. It has been proposed that action-related saccadic signals computed in PPC provide object-invariant information about the number and arrangement of scene elements, and may contribute to relational reasoning in visual displays. Here, we built a glimpsing recurrent neural network that combines gaze contents (“what”) and gaze location (“where”) to count the number of items in a visual array. The network successfully learns to count and generalizes to several out-of-distribution test sets, including images with novel items. Through ablations and comparison to control models, we establish the contribution of brain-inspired computational principles to this generalization ability. This work provides a proof-of-principle demonstration that a neural network that combines “what” and “where” can learn a generalizable concept of numerosity and points to a promising approach for other visual reasoning tasks. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/thompson23a.html https://proceedings.mlr.press/v210/thompson23a.html Skill, or Style? Classification of Fetal Sonography Eye-Tracking Data We present a method for classifying human skill at fetal ultrasound scanning from eye- tracking and pupillary data of sonographers. Human skill characterization for this clinical task typically creates groupings of clinician skills such as expert and beginner based on the number of years of professional experience; experts typically have more than 10 years and beginners between 0-5 years. In some cases, they also include trainees who are not yet fully-qualified professionals. Prior work has considered eye movements that necessi- tates separating eye-tracking data into eye movements, such as fixations and saccades. Our method does not use prior assumptions about the relationship between years of experi- ence and does not require the separation of eye-tracking data. Our best performing skill classification model achieves an F1 score of 98% and 70% for expert and trainee classes re- spectively. We also show that years of experience as a direct measure of skill, is significantly correlated to the expertise of a sonographer. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/teng23a.html https://proceedings.mlr.press/v210/teng23a.html Facial Composite Generation with Iterative Human Feedback We propose the first method in which human and AI collaborate to iteratively reconstruct the human’s mental image of another person’s face only from their eye gaze. Current tools for generating digital human faces involve a tedious and time-consuming manual design process. While gaze-based mental image reconstruction represents a promising alternative, previous methods still assumed prior knowledge about the target face, thereby severely limiting their practical usefulness. The key novelty of our method is a collaborative, it- erative query engine: Based on the user’s gaze behaviour in each iteration, our method predicts which images to show to the user in the next iteration. Results from two human studies (N=12 and N=22) show that our method can visually reconstruct digital faces that are more similar to the mental image, and is more usable compared to other methods. As such, our findings point at the significant potential of human-AI collaboration for recon- structing mental images, potentially also beyond faces, and of human gaze as a rich source of information and a powerful mediator in said collaboration. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/strohm23a.html https://proceedings.mlr.press/v210/strohm23a.html Intention Estimation via Gaze for Robot Guidance in Hierarchical Tasks To provide effective guidance to a human agent performing hierarchical tasks, a robot must determine the level at which to provide guidance. This relies on estimating the agent’s intention at each level of the hierarchy. Unfortunately, observations of task-related movements only provide direct information about intention at the lowest level. In addition, lower level tasks may be shared. The resulting ambiguity impairs timely estimation of higher level intent. This can be resolved by incorporating observations of secondary behaviors like gaze. We propose a probabilistic framework enabling robot guidance in hierarchical tasks via intention estimation from observations of both task-related movements and eye gaze. Experiments with a virtual humanoid robot demonstrate that gaze is a very powerful cue that largely compensates for simplifying assumptions made in modelling task-related movements, enabling a robot controlled by our framework to nearly match the performance of a human wizard. We examine the effect of gaze in improving both the precision and timeliness of guidance cue generation, finding that while both improve with gaze, improvements in timeliness are more significant. Our results suggest that gaze observations are critical in achieving natural and fluid human-robot collaboration, which may enable human agents to undertake significantly more complex tasks and perform them more safely and effectively, than possible without guidance. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/shen23a.html https://proceedings.mlr.press/v210/shen23a.html Appearance-Based Gaze Estimation for Driver Monitoring Driver inattention is a leading cause of road accidents through its impact on reaction time in the face of incidents. In the case of Level-3 (L3) vehicles, inattention adversely impacts the quality of driver take over and therefore the safe performance of L3 vehicles. There is a high correlation between a driver’s visual attention and eye movement. Gaze angle is an excellent surrogate for assessing driver attention zones, in both cabin interior and on-road scenarios. We propose appearance-based gaze estimation approaches using convolutional neural networks (CNNs) to estimate gaze angle directly from eye images and also from eye landmark coordinates. The goal is to improve learning by utilizing synthetic data with more accurate annotations. Performance analysis shows that our proposed landmark-based model, trained synthetically, is capable of predicting gaze angle in the real data with a reasonable angular error. In addition, we discuss evaluation metrics are application specific and there is a crucial requirement for a more reliable assessment metric rather than common mean angular error to measure the driver’s gaze direction in L3 autonomy for a control takeover request at a proper time corresponding to the driver’s attention focus to avoid ambiguities. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/nikan23a.html https://proceedings.mlr.press/v210/nikan23a.html Integrating eye gaze into machine learning using fractal curves Eye gaze tracking has traditionally employed a camera to capture a participant’s eye move- ments and characterise their visual fixations. However, gaze pattern recognition is still challenging. This is due to both gaze point sparsity, and a seemingly random approach participants take to viewing unfamiliar stimuli without a set task. Our paper proposes a method for integrating eye gaze into machine learning by con- verting a fixation’s two dimensional (x, y) coordinate into a one dimensional Hilbert curve distance metric, making it well suited for implementation into machine learning. We will compare this approach to a traditional grid-based string substitution technique, with an example implementation demonstrated in a Support Vector Machine and Convolutional Neural Network. Finally, a comparison will be made to examine what method performs better. Results have shown that this method can be both useful to dynamically quantise scan- paths for tuning statistical significance in large datasets, and to investigate the nuances of similarity found in shared bottom-up processing when participants observe unfamiliar stimuli in a free viewing experiment. Real world applications can include expertise-related eye gaze prediction, medical screening, and image saliency identification. Keywords: Neuroscience, eye tracking, fractals, support vector machine, convolutional neural network. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/newport23a.html https://proceedings.mlr.press/v210/newport23a.html Preface Preface to GMML 2022 Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/lourentzou23a.html https://proceedings.mlr.press/v210/lourentzou23a.html Modeling Human Eye Movements with Neural Networks in a Maze-Solving Task From smoothly pursuing moving objects to rapidly shifting gazes during visual search, humans employ a wide variety of eye movement strategies in different contexts. While eye movements provide a rich window into mental processes, building generative models of eye movements is notoriously difficult, and to date the computational objectives guiding eye movements remain largely a mystery. In this work, we tackled these problems in the context of a canonical spatial planning task, maze-solving. We collected eye movement data from human subjects and built deep generative models of eye movements using a novel differentiable architecture for gaze fixations and gaze shifts. We found that human eye movements are best predicted by a model that is optimized not to perform the task as efficiently as possible but instead to run an internal simulation of an object traversing the maze. This not only provides a generative model of eye movements in this task but also suggests a computational theory for how humans solve the task, namely that humans use mental simulation. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/li23a.html https://proceedings.mlr.press/v210/li23a.html Selection of XAI Methods Matters: Evaluation of Feature Attribution Methods for Oculomotoric Biometric Identification Substantial advances in oculomotoric biometric identification have been made due to deep neural networks processing non-aggregated time series data that replace methods processing theoretically motivated engineered features. However, interpretability of deep neural networks is not trivial and needs to be thoroughly investigated for future eye tracking applications. Especially in medical or legal applications explanations can be required to be provided alongside predictions. In this work, we apply several attribution methods to a state of the art model for eye movement-based biometric identification. To asses the quality of the generated attributions, this work is focused on the quantitative evaluation of a range of established metrics. We find that Layer-wise Relevance Propagation generates the least complex attributions, while DeepLIFT attributions are the most faithful. Due to the absence of a correlation between attributions of these two methods we advocate to consider both methods for their potentially complementary attributions. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/krakowczyk23a.html https://proceedings.mlr.press/v210/krakowczyk23a.html Electrode Clustering and Bandpass Analysis of EEG Data for Gaze Estimation In this study, we validate the findings of previously published papers, showing the feasibility of an Electroencephalography (EEG) based gaze estimation. Moreover, we extend previous research by demonstrating that with only a slight drop in model performance, we can significantly reduce the number of electrodes, indicating that a high-density, expensive EEG cap is not necessary for the purposes of EEG-based eye tracking. Using data-driven approaches, we establish which electrode clusters impact gaze estimation and how the different types of EEG data preprocessing affect the models’ performance. Finally, we also inspect which recorded frequencies are most important for the defined tasks. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/kastrati23a.html https://proceedings.mlr.press/v210/kastrati23a.html Contrastive Representation Learning for Gaze Estimation Self-supervised learning (SSL) has become prevalent for learning representations in computer vision. Notably, SSL exploits contrastive learning to encourage visual represen- tations to be invariant under various image transformations. The task of gaze estimation, on the other hand, demands not just invariance to various appearances but also equiv- ariance to the geometric transformations. In this work, we propose a simple contrastive representation learning framework for gaze estimation, named Gaze Contrastive Learning (GazeCLR). GazeCLR exploits multi-view data to promote equivariance and relies on selected data augmentation techniques that do not alter gaze directions for invariance learning. Our experiments demonstrate the effectiveness of GazeCLR for several settings of the gaze estimation task. Particularly, our results show that GazeCLR improves the performance of cross-domain gaze estimation and yields as high as 17.2% relative improve- ment. Moreover, the GazeCLR framework is competitive with state-of-the-art representation learning methods for few-shot evaluation. The code and pre-trained models are available at https://github.com/jswati31/gazeclr. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/jindal23a.html https://proceedings.mlr.press/v210/jindal23a.html Federated Learning for Appearance-based Gaze Estimation in the Wild Gaze estimation methods have significantly matured in recent years, but the large number of eye images required to train deep learning models poses significant privacy risks. In addition, the heterogeneous data distribution across different users can significantly hinder the training process. In this work, we propose the first federated learning approach for gaze estimation to preserve the privacy of gaze data. We further employ pseudo-gradient optimisation to adapt our federated learning approach to the divergent model updates to address the heterogeneous nature of in-the-wild gaze data in collaborative setups. We evaluate our approach on a real-world dataset (MPIIGaze) and show that our work enhances the privacy guarantees of conventional appearance-based gaze estimation methods, handles the convergence issues of gaze estimators, and significantly outperforms vanilla federated learning by 15.8% (from a mean error of 10.63 degrees to 8.95 degrees). As such, our work paves the way to develop privacy-aware collaborative learning setups for gaze estimation while maintaining the model’s performance. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/elfares23a.html https://proceedings.mlr.press/v210/elfares23a.html Generating Attention Maps from Eye-gaze for the Diagnosis of Alzheimer’s Disease Convolutional neural networks (CNNs) are currently the best computational methods for the diagnosis of Alzheimer’s disease (AD) from neuroimaging. CNNs are able to automati- cally learn a hierarchy of spatial features, but they are not optimized to incorporate domain knowledge. In this work we study the generation of attention maps based on a human expert gaze of the brain scans (domain knowledge) to guide the deep model to focus on the more relevant regions for AD diagnosis. Two strategies to generate the maps from eye-gaze were investigated; the use of average class maps and supervising a network to generate the attention maps. These approaches were compared with masking (hard attention) with regions of interest (ROI) and CNNs with traditional attention mechanisms. For our experiments, we used positron emission tomography (PET) scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. For the task of normal control (NC) vs Alzheimer’s (AD), the best performing model was with insertion of regions of interest (ROI), which achieved 95.6% accuracy, 0.4% higher than the baseline CNN. Keywords: Deep learning; Alzheimer’s disease; Convolutional neural network; Attention mechanism; Eye tracking; Computer-aided diagnosis. Mon, 03 Apr 2023 00:00:00 +0000 https://proceedings.mlr.press/v210/antunes23a.html https://proceedings.mlr.press/v210/antunes23a.html