Head and Body Orientation Estimation with Sparse Weak Labels in Free Standing Conversational Settings

Stephanie Tan, David M.J. Tax, Hayley Hung
Understanding Social Behavior in Dyadic and Small Group Interactions, PMLR 173:179-203, 2022.

Abstract

We focus on estimating human head and body orientations which are crucial social cues in free-standing conversational settings. Automatic estimations of head and body orientations enable downstream research about conversation involvement, influence, and other social concepts. However, in-the-wild human behavior and long interaction datasets are difficult to collect and expensive to annotate. Our approach mitigates the need for large number of training labels by casting the task into a transductive low-rank matrix-completion problem using sparsely labelled data. We differentiate our learning setting from the typical data-intensive setting required for existing supervised deep learning methods. In situations of low labelled data availability, our method takes advantage of the inherent properties and dynamics of the social scenarios by leveraging different sources of information and physical priors. Our method is (1) data efficient and uses a small number of annotated labels, (2) ensures temporal smoothness in predictions, (3) adheres to human anatomical constraints of head and body orientation differences, and (4) exploits weak labels from multimodal wearable sensors. We benchmark this method on the challenging multimodal SALSA dataset, the only large scale dataset that contains video, proximity sensors and microphone audio data. When only using 5% of all the labels as training samples, we report 65% and 76% averaged classification accuracy for head and body orientations, which is an 8% and 16% respective increase compared to previous state-of-the-art performance under the same transductive setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v173-tan22a, title = {Head and Body Orientation Estimation with Sparse Weak Labels in Free Standing Conversational Settings}, author = {Tan, Stephanie and Tax, David M.J. and Hung, Hayley}, booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions}, pages = {179--203}, year = {2022}, editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clapés, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio}, volume = {173}, series = {Proceedings of Machine Learning Research}, month = {16 Oct}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v173/tan22a/tan22a.pdf}, url = {https://proceedings.mlr.press/v173/tan22a.html}, abstract = {We focus on estimating human head and body orientations which are crucial social cues in free-standing conversational settings. Automatic estimations of head and body orientations enable downstream research about conversation involvement, influence, and other social concepts. However, in-the-wild human behavior and long interaction datasets are difficult to collect and expensive to annotate. Our approach mitigates the need for large number of training labels by casting the task into a transductive low-rank matrix-completion problem using sparsely labelled data. We differentiate our learning setting from the typical data-intensive setting required for existing supervised deep learning methods. In situations of low labelled data availability, our method takes advantage of the inherent properties and dynamics of the social scenarios by leveraging different sources of information and physical priors. Our method is (1) data efficient and uses a small number of annotated labels, (2) ensures temporal smoothness in predictions, (3) adheres to human anatomical constraints of head and body orientation differences, and (4) exploits weak labels from multimodal wearable sensors. We benchmark this method on the challenging multimodal SALSA dataset, the only large scale dataset that contains video, proximity sensors and microphone audio data. When only using 5% of all the labels as training samples, we report 65% and 76% averaged classification accuracy for head and body orientations, which is an 8% and 16% respective increase compared to previous state-of-the-art performance under the same transductive setting.} }
Endnote
%0 Conference Paper %T Head and Body Orientation Estimation with Sparse Weak Labels in Free Standing Conversational Settings %A Stephanie Tan %A David M.J. Tax %A Hayley Hung %B Understanding Social Behavior in Dyadic and Small Group Interactions %C Proceedings of Machine Learning Research %D 2022 %E Cristina Palmero %E Julio C. S. Jacques Junior %E Albert Clapés %E Isabelle Guyon %E Wei-Wei Tu %E Thomas B. Moeslund %E Sergio Escalera %F pmlr-v173-tan22a %I PMLR %P 179--203 %U https://proceedings.mlr.press/v173/tan22a.html %V 173 %X We focus on estimating human head and body orientations which are crucial social cues in free-standing conversational settings. Automatic estimations of head and body orientations enable downstream research about conversation involvement, influence, and other social concepts. However, in-the-wild human behavior and long interaction datasets are difficult to collect and expensive to annotate. Our approach mitigates the need for large number of training labels by casting the task into a transductive low-rank matrix-completion problem using sparsely labelled data. We differentiate our learning setting from the typical data-intensive setting required for existing supervised deep learning methods. In situations of low labelled data availability, our method takes advantage of the inherent properties and dynamics of the social scenarios by leveraging different sources of information and physical priors. Our method is (1) data efficient and uses a small number of annotated labels, (2) ensures temporal smoothness in predictions, (3) adheres to human anatomical constraints of head and body orientation differences, and (4) exploits weak labels from multimodal wearable sensors. We benchmark this method on the challenging multimodal SALSA dataset, the only large scale dataset that contains video, proximity sensors and microphone audio data. When only using 5% of all the labels as training samples, we report 65% and 76% averaged classification accuracy for head and body orientations, which is an 8% and 16% respective increase compared to previous state-of-the-art performance under the same transductive setting.
APA
Tan, S., Tax, D.M. & Hung, H.. (2022). Head and Body Orientation Estimation with Sparse Weak Labels in Free Standing Conversational Settings. Understanding Social Behavior in Dyadic and Small Group Interactions, in Proceedings of Machine Learning Research 173:179-203 Available from https://proceedings.mlr.press/v173/tan22a.html.

Related Material