End-to-end Active Object Tracking via Reinforcement Learning

Wenhan Luo, Peng Sun, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3286-3295, 2018.

Abstract

We study active object tracking, where a tracker takes as input the visual observation (i.e. frame sequence) and produces the camera control signal (e.g., move forward, turn left, etc). Conventional methods tackle the tracking and the camera control separately, which is challenging to tune jointly. It also incurs many human efforts for labeling and many expensive trial-and-errors in real-world. To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning, where a ConvNet-LSTM function approximator is adopted for the direct frame-to-action prediction. We further propose an environment augmentation technique and a customized reward function, which are crucial for a successful training. The tracker trained in simulators (ViZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object. It can restore tracking when occasionally losing the target. With the experiments over the VOT dataset, we also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-luo18a, title = {End-to-end Active Object Tracking via Reinforcement Learning}, author = {Luo, Wenhan and Sun, Peng and Zhong, Fangwei and Liu, Wei and Zhang, Tong and Wang, Yizhou}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {3286--3295}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/luo18a/luo18a.pdf}, url = {http://proceedings.mlr.press/v80/luo18a.html}, abstract = {We study active object tracking, where a tracker takes as input the visual observation (i.e. frame sequence) and produces the camera control signal (e.g., move forward, turn left, etc). Conventional methods tackle the tracking and the camera control separately, which is challenging to tune jointly. It also incurs many human efforts for labeling and many expensive trial-and-errors in real-world. To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning, where a ConvNet-LSTM function approximator is adopted for the direct frame-to-action prediction. We further propose an environment augmentation technique and a customized reward function, which are crucial for a successful training. The tracker trained in simulators (ViZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object. It can restore tracking when occasionally losing the target. With the experiments over the VOT dataset, we also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios.} }
Endnote
%0 Conference Paper %T End-to-end Active Object Tracking via Reinforcement Learning %A Wenhan Luo %A Peng Sun %A Fangwei Zhong %A Wei Liu %A Tong Zhang %A Yizhou Wang %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-luo18a %I PMLR %P 3286--3295 %U http://proceedings.mlr.press/v80/luo18a.html %V 80 %X We study active object tracking, where a tracker takes as input the visual observation (i.e. frame sequence) and produces the camera control signal (e.g., move forward, turn left, etc). Conventional methods tackle the tracking and the camera control separately, which is challenging to tune jointly. It also incurs many human efforts for labeling and many expensive trial-and-errors in real-world. To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning, where a ConvNet-LSTM function approximator is adopted for the direct frame-to-action prediction. We further propose an environment augmentation technique and a customized reward function, which are crucial for a successful training. The tracker trained in simulators (ViZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object. It can restore tracking when occasionally losing the target. With the experiments over the VOT dataset, we also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios.
APA
Luo, W., Sun, P., Zhong, F., Liu, W., Zhang, T. & Wang, Y.. (2018). End-to-end Active Object Tracking via Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3286-3295 Available from http://proceedings.mlr.press/v80/luo18a.html.

Related Material