Motion Question Answering via Modular Motion Programs

Mark Endo, Joy Hsu, Jiaman Li, Jiajun Wu
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:9312-9328, 2023.

Abstract

In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences. Moving towards this goal, we propose the HumanMotionQA task to evaluate complex, multi-step reasoning abilities of models on long-form human motion sequences. We generate a dataset of question-answer pairs that require detecting motor cues in small portions of motion sequences, reasoning temporally about when events occur, and querying specific motion attributes. In addition, we propose NSPose, a neuro-symbolic method for this task that uses symbolic reasoning and a modular design to ground motion through learning motion concepts, attribute neural operators, and temporal relations. We demonstrate the suitability of NSPose for the HumanMotionQA task, outperforming all baseline methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-endo23a, title = {Motion Question Answering via Modular Motion Programs}, author = {Endo, Mark and Hsu, Joy and Li, Jiaman and Wu, Jiajun}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {9312--9328}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/endo23a/endo23a.pdf}, url = {https://proceedings.mlr.press/v202/endo23a.html}, abstract = {In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences. Moving towards this goal, we propose the HumanMotionQA task to evaluate complex, multi-step reasoning abilities of models on long-form human motion sequences. We generate a dataset of question-answer pairs that require detecting motor cues in small portions of motion sequences, reasoning temporally about when events occur, and querying specific motion attributes. In addition, we propose NSPose, a neuro-symbolic method for this task that uses symbolic reasoning and a modular design to ground motion through learning motion concepts, attribute neural operators, and temporal relations. We demonstrate the suitability of NSPose for the HumanMotionQA task, outperforming all baseline methods.} }
Endnote
%0 Conference Paper %T Motion Question Answering via Modular Motion Programs %A Mark Endo %A Joy Hsu %A Jiaman Li %A Jiajun Wu %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-endo23a %I PMLR %P 9312--9328 %U https://proceedings.mlr.press/v202/endo23a.html %V 202 %X In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences. Moving towards this goal, we propose the HumanMotionQA task to evaluate complex, multi-step reasoning abilities of models on long-form human motion sequences. We generate a dataset of question-answer pairs that require detecting motor cues in small portions of motion sequences, reasoning temporally about when events occur, and querying specific motion attributes. In addition, we propose NSPose, a neuro-symbolic method for this task that uses symbolic reasoning and a modular design to ground motion through learning motion concepts, attribute neural operators, and temporal relations. We demonstrate the suitability of NSPose for the HumanMotionQA task, outperforming all baseline methods.
APA
Endo, M., Hsu, J., Li, J. & Wu, J.. (2023). Motion Question Answering via Modular Motion Programs. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:9312-9328 Available from https://proceedings.mlr.press/v202/endo23a.html.

Related Material