Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving

Kairui Ding, Boyuan Chen, Yuchen Su, Huan-ang Gao, Bu Jin, Chonghao Sima, Xiaohui Li, Wuqiang Zhang, Paul Barsch, Hongyang Li, Hao Zhao
Proceedings of The 8th Conference on Robot Learning, PMLR 270:3742-3765, 2025.

Abstract

End-to-end architectures in autonomous driving (AD) face a significant challenge in interpretability, impeding human-AI trust. Human-friendly natural language has been explored for tasks such as driving explanation and 3D captioning. However, previous works primarily focused on the paradigm of declarative interpretability, where the natural language interpretations are not grounded in the intermediate outputs of AD systems, making the interpretations only declarative. In contrast, aligned interpretability establishes a connection between language and the intermediate outputs of AD systems. Here we introduce Hint-AD, an integrated AD-language system that generates language aligned with the holistic perception-prediction-planning outputs of the AD model. By incorporating the intermediate outputs and a holistic token mixer sub-network for effective feature adaptation, Hint-AD achieves desirable accuracy, achieving state-of-the-art results in driving language tasks including driving explanation, 3D dense captioning, and command prediction. To facilitate further study on driving explanation task on nuScenes, we also introduce a human-labeled dataset, Nu-X. Codes, dataset, and models are publicly available at https://anonymous.4open.science/r/Hint-AD-1385/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-ding25a, title = {Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving}, author = {Ding, Kairui and Chen, Boyuan and Su, Yuchen and Gao, Huan-ang and Jin, Bu and Sima, Chonghao and Li, Xiaohui and Zhang, Wuqiang and Barsch, Paul and Li, Hongyang and Zhao, Hao}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {3742--3765}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/ding25a/ding25a.pdf}, url = {https://proceedings.mlr.press/v270/ding25a.html}, abstract = {End-to-end architectures in autonomous driving (AD) face a significant challenge in interpretability, impeding human-AI trust. Human-friendly natural language has been explored for tasks such as driving explanation and 3D captioning. However, previous works primarily focused on the paradigm of declarative interpretability, where the natural language interpretations are not grounded in the intermediate outputs of AD systems, making the interpretations only declarative. In contrast, aligned interpretability establishes a connection between language and the intermediate outputs of AD systems. Here we introduce Hint-AD, an integrated AD-language system that generates language aligned with the holistic perception-prediction-planning outputs of the AD model. By incorporating the intermediate outputs and a holistic token mixer sub-network for effective feature adaptation, Hint-AD achieves desirable accuracy, achieving state-of-the-art results in driving language tasks including driving explanation, 3D dense captioning, and command prediction. To facilitate further study on driving explanation task on nuScenes, we also introduce a human-labeled dataset, Nu-X. Codes, dataset, and models are publicly available at https://anonymous.4open.science/r/Hint-AD-1385/.} }
Endnote
%0 Conference Paper %T Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving %A Kairui Ding %A Boyuan Chen %A Yuchen Su %A Huan-ang Gao %A Bu Jin %A Chonghao Sima %A Xiaohui Li %A Wuqiang Zhang %A Paul Barsch %A Hongyang Li %A Hao Zhao %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-ding25a %I PMLR %P 3742--3765 %U https://proceedings.mlr.press/v270/ding25a.html %V 270 %X End-to-end architectures in autonomous driving (AD) face a significant challenge in interpretability, impeding human-AI trust. Human-friendly natural language has been explored for tasks such as driving explanation and 3D captioning. However, previous works primarily focused on the paradigm of declarative interpretability, where the natural language interpretations are not grounded in the intermediate outputs of AD systems, making the interpretations only declarative. In contrast, aligned interpretability establishes a connection between language and the intermediate outputs of AD systems. Here we introduce Hint-AD, an integrated AD-language system that generates language aligned with the holistic perception-prediction-planning outputs of the AD model. By incorporating the intermediate outputs and a holistic token mixer sub-network for effective feature adaptation, Hint-AD achieves desirable accuracy, achieving state-of-the-art results in driving language tasks including driving explanation, 3D dense captioning, and command prediction. To facilitate further study on driving explanation task on nuScenes, we also introduce a human-labeled dataset, Nu-X. Codes, dataset, and models are publicly available at https://anonymous.4open.science/r/Hint-AD-1385/.
APA
Ding, K., Chen, B., Su, Y., Gao, H., Jin, B., Sima, C., Li, X., Zhang, W., Barsch, P., Li, H. & Zhao, H.. (2025). Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:3742-3765 Available from https://proceedings.mlr.press/v270/ding25a.html.

Related Material