DiffDA: a Diffusion model for weather-scale Data Assimilation

Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter Dominik Dueben, Torsten Hoefler
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:19798-19815, 2024.

Abstract

The generation of initial conditions via accurate data assimilation is crucial for weather forecasting and climate modeling. We propose DiffDA as a denoising diffusion model capable of assimilating atmospheric variables using predicted states and sparse observations. Acknowledging the similarity between a weather forecast model and a denoising diffusion model dedicated to weather applications, we adapt the pretrained GraphCast neural network as the backbone of the diffusion model. Through experiments based on simulated observations from the ERA5 reanalysis dataset, our method can produce assimilated global atmospheric data consistent with observations at 0.25$^\circ$ ($\approx$30km) resolution globally. This marks the highest resolution achieved by ML data assimilation models. The experiments also show that the initial conditions assimilated from sparse observations (less than 0.96% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5. This enables the application of the method to real-world applications, such as creating reanalysis datasets with autoregressive data assimilation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-huang24h, title = {{D}iff{DA}: a Diffusion model for weather-scale Data Assimilation}, author = {Huang, Langwen and Gianinazzi, Lukas and Yu, Yuejiang and Dueben, Peter Dominik and Hoefler, Torsten}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {19798--19815}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/huang24h/huang24h.pdf}, url = {https://proceedings.mlr.press/v235/huang24h.html}, abstract = {The generation of initial conditions via accurate data assimilation is crucial for weather forecasting and climate modeling. We propose DiffDA as a denoising diffusion model capable of assimilating atmospheric variables using predicted states and sparse observations. Acknowledging the similarity between a weather forecast model and a denoising diffusion model dedicated to weather applications, we adapt the pretrained GraphCast neural network as the backbone of the diffusion model. Through experiments based on simulated observations from the ERA5 reanalysis dataset, our method can produce assimilated global atmospheric data consistent with observations at 0.25$^\circ$ ($\approx$30km) resolution globally. This marks the highest resolution achieved by ML data assimilation models. The experiments also show that the initial conditions assimilated from sparse observations (less than 0.96% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5. This enables the application of the method to real-world applications, such as creating reanalysis datasets with autoregressive data assimilation.} }
Endnote
%0 Conference Paper %T DiffDA: a Diffusion model for weather-scale Data Assimilation %A Langwen Huang %A Lukas Gianinazzi %A Yuejiang Yu %A Peter Dominik Dueben %A Torsten Hoefler %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-huang24h %I PMLR %P 19798--19815 %U https://proceedings.mlr.press/v235/huang24h.html %V 235 %X The generation of initial conditions via accurate data assimilation is crucial for weather forecasting and climate modeling. We propose DiffDA as a denoising diffusion model capable of assimilating atmospheric variables using predicted states and sparse observations. Acknowledging the similarity between a weather forecast model and a denoising diffusion model dedicated to weather applications, we adapt the pretrained GraphCast neural network as the backbone of the diffusion model. Through experiments based on simulated observations from the ERA5 reanalysis dataset, our method can produce assimilated global atmospheric data consistent with observations at 0.25$^\circ$ ($\approx$30km) resolution globally. This marks the highest resolution achieved by ML data assimilation models. The experiments also show that the initial conditions assimilated from sparse observations (less than 0.96% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5. This enables the application of the method to real-world applications, such as creating reanalysis datasets with autoregressive data assimilation.
APA
Huang, L., Gianinazzi, L., Yu, Y., Dueben, P.D. & Hoefler, T.. (2024). DiffDA: a Diffusion model for weather-scale Data Assimilation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:19798-19815 Available from https://proceedings.mlr.press/v235/huang24h.html.

Related Material