PlanT: Explainable Planning Transformers via Object-Level Representations

Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, Andreas Geiger
Proceedings of The 6th Conference on Robot Learning, PMLR 205:459-470, 2023.

Abstract

Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to the decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle and road context information. In this paper, we propose PlanT, a novel approach for planning in the context of self-driving that uses a standard transformer architecture. PlanT is based on imitation learning with a compact object-level input representation. On the Longest6 benchmark for CARLA, PlanT outperforms all prior methods (matching the driving score of the expert) while being 5.3× faster than equivalent pixel-based planning baselines during inference. Combining PlanT with an off-the-shelf perception module provides a sensor-based driving system that is more than 10 points better in terms of driving score than the existing state of the art. Furthermore, we propose an evaluation protocol to quantify the ability of planners to identify relevant objects, providing insights regarding their decision-making. Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-renz23a, title = {PlanT: Explainable Planning Transformers via Object-Level Representations}, author = {Renz, Katrin and Chitta, Kashyap and Mercea, Otniel-Bogdan and Koepke, A. Sophia and Akata, Zeynep and Geiger, Andreas}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {459--470}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/renz23a/renz23a.pdf}, url = {https://proceedings.mlr.press/v205/renz23a.html}, abstract = {Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to the decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle and road context information. In this paper, we propose PlanT, a novel approach for planning in the context of self-driving that uses a standard transformer architecture. PlanT is based on imitation learning with a compact object-level input representation. On the Longest6 benchmark for CARLA, PlanT outperforms all prior methods (matching the driving score of the expert) while being 5.3× faster than equivalent pixel-based planning baselines during inference. Combining PlanT with an off-the-shelf perception module provides a sensor-based driving system that is more than 10 points better in terms of driving score than the existing state of the art. Furthermore, we propose an evaluation protocol to quantify the ability of planners to identify relevant objects, providing insights regarding their decision-making. Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.} }
Endnote
%0 Conference Paper %T PlanT: Explainable Planning Transformers via Object-Level Representations %A Katrin Renz %A Kashyap Chitta %A Otniel-Bogdan Mercea %A A. Sophia Koepke %A Zeynep Akata %A Andreas Geiger %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-renz23a %I PMLR %P 459--470 %U https://proceedings.mlr.press/v205/renz23a.html %V 205 %X Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to the decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle and road context information. In this paper, we propose PlanT, a novel approach for planning in the context of self-driving that uses a standard transformer architecture. PlanT is based on imitation learning with a compact object-level input representation. On the Longest6 benchmark for CARLA, PlanT outperforms all prior methods (matching the driving score of the expert) while being 5.3× faster than equivalent pixel-based planning baselines during inference. Combining PlanT with an off-the-shelf perception module provides a sensor-based driving system that is more than 10 points better in terms of driving score than the existing state of the art. Furthermore, we propose an evaluation protocol to quantify the ability of planners to identify relevant objects, providing insights regarding their decision-making. Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.
APA
Renz, K., Chitta, K., Mercea, O., Koepke, A.S., Akata, Z. & Geiger, A.. (2023). PlanT: Explainable Planning Transformers via Object-Level Representations. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:459-470 Available from https://proceedings.mlr.press/v205/renz23a.html.

Related Material