[edit]
Head and Body Orientation Estimation with Sparse Weak Labels in Free Standing Conversational Settings
Understanding Social Behavior in Dyadic and Small Group Interactions, PMLR 173:179-203, 2022.
Abstract
We focus on estimating human head and body orientations which are crucial social cues in free-standing conversational settings. Automatic estimations of head and body orientations enable downstream research about conversation involvement, influence, and other social concepts. However, in-the-wild human behavior and long interaction datasets are difficult to collect and expensive to annotate. Our approach mitigates the need for large number of training labels by casting the task into a transductive low-rank matrix-completion problem using sparsely labelled data. We differentiate our learning setting from the typical data-intensive setting required for existing supervised deep learning methods. In situations of low labelled data availability, our method takes advantage of the inherent properties and dynamics of the social scenarios by leveraging different sources of information and physical priors. Our method is (1) data efficient and uses a small number of annotated labels, (2) ensures temporal smoothness in predictions, (3) adheres to human anatomical constraints of head and body orientation differences, and (4) exploits weak labels from multimodal wearable sensors. We benchmark this method on the challenging multimodal SALSA dataset, the only large scale dataset that contains video, proximity sensors and microphone audio data. When only using 5% of all the labels as training samples, we report 65% and 76% averaged classification accuracy for head and body orientations, which is an 8% and 16% respective increase compared to previous state-of-the-art performance under the same transductive setting.