Animal pose estimation from video data with a hierarchical von Mises-Fisher-Gaussian model

Libby Zhang, Tim Dunn, Jesse Marshall, Bence Olveczky, Scott Linderman
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2800-2808, 2021.

Abstract

Animal pose estimation from video data is an important step in many biological studies, but current methods struggle in complex environments where occlusions are common and training data is scarce. Recent work has demonstrated improved accuracy with deep neural networks, but these methods often do not incorporate prior distributions that could improve localization. Here we present GIMBAL: a hierarchical von Mises-Fisher-Gaussian model that improves upon deep networks’ estimates by leveraging spatiotemporal constraints. The spatial constraints come from the animal’s skeleton, which induces a curved manifold of keypoint configurations. The temporal constraints come from the postural dynamics, which govern how angles between keypoints change over time. Importantly, the conditional conjugacy of the model permits simple and efficient Bayesian inference algorithms. We assess the model on a unique experimental dataset with video of a freely-behaving rodent from multiple viewpoints and ground-truth motion capture data for 20 keypoints. GIMBAL extends existing techniques, and in doing so offers more accurate estimates of keypoint positions, especially in challenging contexts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-zhang21h, title = { Animal pose estimation from video data with a hierarchical von Mises-Fisher-Gaussian model }, author = {Zhang, Libby and Dunn, Tim and Marshall, Jesse and Olveczky, Bence and Linderman, Scott}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2800--2808}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/zhang21h/zhang21h.pdf}, url = {https://proceedings.mlr.press/v130/zhang21h.html}, abstract = { Animal pose estimation from video data is an important step in many biological studies, but current methods struggle in complex environments where occlusions are common and training data is scarce. Recent work has demonstrated improved accuracy with deep neural networks, but these methods often do not incorporate prior distributions that could improve localization. Here we present GIMBAL: a hierarchical von Mises-Fisher-Gaussian model that improves upon deep networks’ estimates by leveraging spatiotemporal constraints. The spatial constraints come from the animal’s skeleton, which induces a curved manifold of keypoint configurations. The temporal constraints come from the postural dynamics, which govern how angles between keypoints change over time. Importantly, the conditional conjugacy of the model permits simple and efficient Bayesian inference algorithms. We assess the model on a unique experimental dataset with video of a freely-behaving rodent from multiple viewpoints and ground-truth motion capture data for 20 keypoints. GIMBAL extends existing techniques, and in doing so offers more accurate estimates of keypoint positions, especially in challenging contexts. } }
Endnote
%0 Conference Paper %T Animal pose estimation from video data with a hierarchical von Mises-Fisher-Gaussian model %A Libby Zhang %A Tim Dunn %A Jesse Marshall %A Bence Olveczky %A Scott Linderman %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-zhang21h %I PMLR %P 2800--2808 %U https://proceedings.mlr.press/v130/zhang21h.html %V 130 %X Animal pose estimation from video data is an important step in many biological studies, but current methods struggle in complex environments where occlusions are common and training data is scarce. Recent work has demonstrated improved accuracy with deep neural networks, but these methods often do not incorporate prior distributions that could improve localization. Here we present GIMBAL: a hierarchical von Mises-Fisher-Gaussian model that improves upon deep networks’ estimates by leveraging spatiotemporal constraints. The spatial constraints come from the animal’s skeleton, which induces a curved manifold of keypoint configurations. The temporal constraints come from the postural dynamics, which govern how angles between keypoints change over time. Importantly, the conditional conjugacy of the model permits simple and efficient Bayesian inference algorithms. We assess the model on a unique experimental dataset with video of a freely-behaving rodent from multiple viewpoints and ground-truth motion capture data for 20 keypoints. GIMBAL extends existing techniques, and in doing so offers more accurate estimates of keypoint positions, especially in challenging contexts.
APA
Zhang, L., Dunn, T., Marshall, J., Olveczky, B. & Linderman, S.. (2021). Animal pose estimation from video data with a hierarchical von Mises-Fisher-Gaussian model . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2800-2808 Available from https://proceedings.mlr.press/v130/zhang21h.html.

Related Material