[edit]
Facial Expression and Peripheral Physiology Fusion to Decode Individualized Affective Experience
Proceedings of IJCAI 2018 2nd Workshop on Artificial Intelligence in Affective Computing, PMLR 86:10-26, 2020.
Abstract
Affective experience prediction using different data modalities measured from an individual such as their facial expression or physiological signals has received substantial research attention in recent years. However, most studies ignore the fact that people besides having different responses under affective stimuli, may also have different resting dynamics (embedded in both facial and physiological patterns) to begin with. In this paper, we present a multimodal approach to simultaneously analyze facial movements and several peripheral physiological signals to decode individualized affective experiences under positive and negative emotional contexts, while considering their personalized resting dynamics. We propose a person-specific recurrence network to quantify the dynamics present in the person’s facial movements and physiological data. Facial movement is represented using a robust head vs. 3D face landmark localization and tracking approach, and physiological data are processed by extracting known attributes related to the underlying affective experience. The dynamical coupling between different input modalities is then assessed through the extraction of several complex recurrent network metrics. Inference models are then trained using these metrics as features to predict individual’s affective experience in a given context, after their resting dynamics are excluded from their response. We validated our approach using a multimodal dataset consists of (i) facial videos and (ii) several peripheral physiological signals, synchronously recorded from 12 participants while watching 4 emotion-eliciting video-based stimuli. The affective experience prediction results signified that our multimodal fusion method improves the prediction accuracy up to 19% when compared to the prediction using only one or a subset of the input modalities. Furthermore, we gained prediction improvement for affective experience by considering the effect of individualized resting dynamics.