CoastalBench: A Decade-Long High-Resolution Dataset to Emulate Complex Coastal Processes

Zelin Xu, Yupu Zhang, Tingsong Xiao, Maitane Olabarrieta Lizaso, Jose M. Gonzalez-Ondina, Zibo Liu, Shigang Chen, Zhe Jiang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:69908-69920, 2025.

Abstract

Over 40% of the global population lives within 100 kilometers of the coast, which contributes more than $8 trillion annually to the global economy. Unfortunately, coastal ecosystems are increasingly vulnerable to more frequent and intense extreme weather events and rising sea levels. Coastal scientists use numerical models to simulate complex physical processes, but these models are often slow and expensive. In recent years, deep learning has become a promising alternative to reduce the cost of numerical models. However, progress has been hindered by the lack of a large-scale, high-resolution coastal simulation dataset to train and validate deep learning models. Existing studies often focus on relatively small datasets and simple processes. To fill this gap, we introduce a decade-long, high-resolution ($<$100m) coastal circulation modeling dataset on a real-world 3D mesh in southwest Florida with around 6 million cells. The dataset contains key oceanography variables (e.g., current velocities, free surface level, temperature, salinity) alongside external atmospheric and river forcings. We evaluated a customized Vision Transformer model that takes initial and boundary conditions and external forcings and predicts ocean variables at varying lead times. The dataset provides an opportunity to benchmark novel deep learning models for high-resolution coastal simulations (e.g., physics-informed machine learning, neural operator learning). The code and dataset can be accessed at https://github.com/spatialdatasciencegroup/CoastalBench.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-xu25ak, title = {{C}oastal{B}ench: A Decade-Long High-Resolution Dataset to Emulate Complex Coastal Processes}, author = {Xu, Zelin and Zhang, Yupu and Xiao, Tingsong and Lizaso, Maitane Olabarrieta and Gonzalez-Ondina, Jose M. and Liu, Zibo and Chen, Shigang and Jiang, Zhe}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {69908--69920}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xu25ak/xu25ak.pdf}, url = {https://proceedings.mlr.press/v267/xu25ak.html}, abstract = {Over 40% of the global population lives within 100 kilometers of the coast, which contributes more than $8 trillion annually to the global economy. Unfortunately, coastal ecosystems are increasingly vulnerable to more frequent and intense extreme weather events and rising sea levels. Coastal scientists use numerical models to simulate complex physical processes, but these models are often slow and expensive. In recent years, deep learning has become a promising alternative to reduce the cost of numerical models. However, progress has been hindered by the lack of a large-scale, high-resolution coastal simulation dataset to train and validate deep learning models. Existing studies often focus on relatively small datasets and simple processes. To fill this gap, we introduce a decade-long, high-resolution ($<$100m) coastal circulation modeling dataset on a real-world 3D mesh in southwest Florida with around 6 million cells. The dataset contains key oceanography variables (e.g., current velocities, free surface level, temperature, salinity) alongside external atmospheric and river forcings. We evaluated a customized Vision Transformer model that takes initial and boundary conditions and external forcings and predicts ocean variables at varying lead times. The dataset provides an opportunity to benchmark novel deep learning models for high-resolution coastal simulations (e.g., physics-informed machine learning, neural operator learning). The code and dataset can be accessed at https://github.com/spatialdatasciencegroup/CoastalBench.} }
Endnote
%0 Conference Paper %T CoastalBench: A Decade-Long High-Resolution Dataset to Emulate Complex Coastal Processes %A Zelin Xu %A Yupu Zhang %A Tingsong Xiao %A Maitane Olabarrieta Lizaso %A Jose M. Gonzalez-Ondina %A Zibo Liu %A Shigang Chen %A Zhe Jiang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-xu25ak %I PMLR %P 69908--69920 %U https://proceedings.mlr.press/v267/xu25ak.html %V 267 %X Over 40% of the global population lives within 100 kilometers of the coast, which contributes more than $8 trillion annually to the global economy. Unfortunately, coastal ecosystems are increasingly vulnerable to more frequent and intense extreme weather events and rising sea levels. Coastal scientists use numerical models to simulate complex physical processes, but these models are often slow and expensive. In recent years, deep learning has become a promising alternative to reduce the cost of numerical models. However, progress has been hindered by the lack of a large-scale, high-resolution coastal simulation dataset to train and validate deep learning models. Existing studies often focus on relatively small datasets and simple processes. To fill this gap, we introduce a decade-long, high-resolution ($<$100m) coastal circulation modeling dataset on a real-world 3D mesh in southwest Florida with around 6 million cells. The dataset contains key oceanography variables (e.g., current velocities, free surface level, temperature, salinity) alongside external atmospheric and river forcings. We evaluated a customized Vision Transformer model that takes initial and boundary conditions and external forcings and predicts ocean variables at varying lead times. The dataset provides an opportunity to benchmark novel deep learning models for high-resolution coastal simulations (e.g., physics-informed machine learning, neural operator learning). The code and dataset can be accessed at https://github.com/spatialdatasciencegroup/CoastalBench.
APA
Xu, Z., Zhang, Y., Xiao, T., Lizaso, M.O., Gonzalez-Ondina, J.M., Liu, Z., Chen, S. & Jiang, Z.. (2025). CoastalBench: A Decade-Long High-Resolution Dataset to Emulate Complex Coastal Processes. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:69908-69920 Available from https://proceedings.mlr.press/v267/xu25ak.html.

Related Material