ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation

Nantian He, Shaohui Li, Zhi Li, Yu Liu, You He
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:17921-17933, 2024.

Abstract

The diffusion model has demonstrated impressive performance in offline reinforcement learning. However, non-deterministic sampling in diffusion models can lead to unstable performance. Furthermore, the lack of confidence measurements makes it difficult to evaluate the reliability and trustworthiness of the sampled decisions. To address these issues, we present ReDiffuser, which utilizes confidence estimation to ensure reliable decision-making. We achieve this by learning a confidence function based on Random Network Distillation. The confidence function measures the reliability of sampled decisions and contributes to quantitative recognition of reliable decisions. Additionally, we integrate the confidence function into task-specific sampling procedures to realize adaptive-horizon planning and value-embedded planning. Experiments show that the proposed ReDiffuser achieves state-of-the-art performance on standard offline RL datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-he24e, title = {{R}e{D}iffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation}, author = {He, Nantian and Li, Shaohui and Li, Zhi and Liu, Yu and He, You}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {17921--17933}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/he24e/he24e.pdf}, url = {https://proceedings.mlr.press/v235/he24e.html}, abstract = {The diffusion model has demonstrated impressive performance in offline reinforcement learning. However, non-deterministic sampling in diffusion models can lead to unstable performance. Furthermore, the lack of confidence measurements makes it difficult to evaluate the reliability and trustworthiness of the sampled decisions. To address these issues, we present ReDiffuser, which utilizes confidence estimation to ensure reliable decision-making. We achieve this by learning a confidence function based on Random Network Distillation. The confidence function measures the reliability of sampled decisions and contributes to quantitative recognition of reliable decisions. Additionally, we integrate the confidence function into task-specific sampling procedures to realize adaptive-horizon planning and value-embedded planning. Experiments show that the proposed ReDiffuser achieves state-of-the-art performance on standard offline RL datasets.} }
Endnote
%0 Conference Paper %T ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation %A Nantian He %A Shaohui Li %A Zhi Li %A Yu Liu %A You He %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-he24e %I PMLR %P 17921--17933 %U https://proceedings.mlr.press/v235/he24e.html %V 235 %X The diffusion model has demonstrated impressive performance in offline reinforcement learning. However, non-deterministic sampling in diffusion models can lead to unstable performance. Furthermore, the lack of confidence measurements makes it difficult to evaluate the reliability and trustworthiness of the sampled decisions. To address these issues, we present ReDiffuser, which utilizes confidence estimation to ensure reliable decision-making. We achieve this by learning a confidence function based on Random Network Distillation. The confidence function measures the reliability of sampled decisions and contributes to quantitative recognition of reliable decisions. Additionally, we integrate the confidence function into task-specific sampling procedures to realize adaptive-horizon planning and value-embedded planning. Experiments show that the proposed ReDiffuser achieves state-of-the-art performance on standard offline RL datasets.
APA
He, N., Li, S., Li, Z., Liu, Y. & He, Y.. (2024). ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:17921-17933 Available from https://proceedings.mlr.press/v235/he24e.html.

Related Material