Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation

Hema Koppula; Ashutosh Saxena

Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation

Hema Koppula, Ashutosh Saxena

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):792-800, 2013.

Abstract

We consider the problem of detecting past activities as well as anticipating which activity will happen in the future and how. We start by modeling the rich spatio-temporal relations between human poses and objects (called affordances) using a conditional random field (CRF). However, because of the ambiguity in the temporal segmentation of the sub-activities that constitute an activity, in the past as well as in the future, multiple graph structures are possible. In this paper, we reason about these alternate possibilities by reasoning over multiple possible graph structures. We obtain them by approximating the graph with only additive features, which lends to efficient dynamic programming. Starting with this proposal graph structure, we then design moves to obtain several other likely graph structures. We then show that our approach improves the state-of-the-art significantly for detecting past activities as well as for anticipating future activities, on a dataset of 120 activity videos collected from four subjects.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-koppula13,
  title = 	 {Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation},
  author = 	 {Koppula, Hema and Saxena, Ashutosh},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {792--800},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/koppula13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/koppula13.html},
  abstract = 	 {We consider the problem of detecting past activities as well as anticipating which activity will happen in the future and how. We start by modeling the rich spatio-temporal relations between human poses and objects (called affordances) using a conditional random field (CRF). However, because of the ambiguity in the temporal segmentation of the sub-activities that constitute an activity, in the past as well as in the future, multiple graph structures are possible. In this paper, we reason about these alternate possibilities by reasoning over multiple possible graph structures. We obtain them by approximating the graph with only additive features, which lends to efficient dynamic programming. Starting with this proposal graph structure, we then design moves to obtain several other likely graph structures. We then show that our approach improves the state-of-the-art significantly for detecting past activities as well as for anticipating future activities, on a dataset of 120 activity videos collected from four subjects.}
}

Endnote

%0 Conference Paper
%T Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation
%A Hema Koppula
%A Ashutosh Saxena
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-koppula13
%I PMLR
%P 792--800
%U https://proceedings.mlr.press/v28/koppula13.html
%V 28
%N 3
%X We consider the problem of detecting past activities as well as anticipating which activity will happen in the future and how. We start by modeling the rich spatio-temporal relations between human poses and objects (called affordances) using a conditional random field (CRF). However, because of the ambiguity in the temporal segmentation of the sub-activities that constitute an activity, in the past as well as in the future, multiple graph structures are possible. In this paper, we reason about these alternate possibilities by reasoning over multiple possible graph structures. We obtain them by approximating the graph with only additive features, which lends to efficient dynamic programming. Starting with this proposal graph structure, we then design moves to obtain several other likely graph structures. We then show that our approach improves the state-of-the-art significantly for detecting past activities as well as for anticipating future activities, on a dataset of 120 activity videos collected from four subjects.

RIS


TY  - CPAPER
TI  - Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation
AU  - Hema Koppula
AU  - Ashutosh Saxena
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-koppula13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 792
EP  - 800
L1  - http://proceedings.mlr.press/v28/koppula13.pdf
UR  - https://proceedings.mlr.press/v28/koppula13.html
AB  - We consider the problem of detecting past activities as well as anticipating which activity will happen in the future and how. We start by modeling the rich spatio-temporal relations between human poses and objects (called affordances) using a conditional random field (CRF). However, because of the ambiguity in the temporal segmentation of the sub-activities that constitute an activity, in the past as well as in the future, multiple graph structures are possible. In this paper, we reason about these alternate possibilities by reasoning over multiple possible graph structures. We obtain them by approximating the graph with only additive features, which lends to efficient dynamic programming. Starting with this proposal graph structure, we then design moves to obtain several other likely graph structures. We then show that our approach improves the state-of-the-art significantly for detecting past activities as well as for anticipating future activities, on a dataset of 120 activity videos collected from four subjects.
ER  -

APA


Koppula, H. & Saxena, A.. (2013). Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):792-800 Available from https://proceedings.mlr.press/v28/koppula13.html.

Related Material

Download PDF