Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios

German Barquero, Johnny Núñez, Zhen Xu, Sergio Escalera, Wei-Wei Tu, Isabelle Guyon, Cristina Palmero
Understanding Social Behavior in Dyadic and Small Group Interactions, PMLR 173:107-138, 2022.

Abstract

Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning. Combined with large-scale datasets, this may help boost the advances in this field.

Cite this Paper


BibTeX
@InProceedings{pmlr-v173-barquero22a, title = {Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios}, author = {Barquero, German and N{\'u}{\~n}ez, Johnny and Xu, Zhen and Escalera, Sergio and Tu, Wei-Wei and Guyon, Isabelle and Palmero, Cristina}, booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions}, pages = {107--138}, year = {2022}, editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clapés, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio}, volume = {173}, series = {Proceedings of Machine Learning Research}, month = {16 Oct}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v173/barquero22a/barquero22a.pdf}, url = {https://proceedings.mlr.press/v173/barquero22a.html}, abstract = {Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning. Combined with large-scale datasets, this may help boost the advances in this field.} }
Endnote
%0 Conference Paper %T Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios %A German Barquero %A Johnny Núñez %A Zhen Xu %A Sergio Escalera %A Wei-Wei Tu %A Isabelle Guyon %A Cristina Palmero %B Understanding Social Behavior in Dyadic and Small Group Interactions %C Proceedings of Machine Learning Research %D 2022 %E Cristina Palmero %E Julio C. S. Jacques Junior %E Albert Clapés %E Isabelle Guyon %E Wei-Wei Tu %E Thomas B. Moeslund %E Sergio Escalera %F pmlr-v173-barquero22a %I PMLR %P 107--138 %U https://proceedings.mlr.press/v173/barquero22a.html %V 173 %X Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning. Combined with large-scale datasets, this may help boost the advances in this field.
APA
Barquero, G., Núñez, J., Xu, Z., Escalera, S., Tu, W., Guyon, I. & Palmero, C.. (2022). Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios. Understanding Social Behavior in Dyadic and Small Group Interactions, in Proceedings of Machine Learning Research 173:107-138 Available from https://proceedings.mlr.press/v173/barquero22a.html.

Related Material