Optimizing Cloud-to-GPU Throughput for Deep Learning With Earth Observation Data

Akram Zaytar, Caleb Robinson, Girmaw Abebe Tadesse, Tammy Glazer, Gilles HACHEME, Anthony Ortiz, Rahul M Dodhia, Juan M Lavista Ferres
Proceedings of The TerraBytes {ICML} Workshop: Towards global datasets and models for Earth Observation, PMLR 292:97-110, 2025.

Abstract

Training deep learning models on petabyte-scale Earth Observation (EO) data requires separating compute resources from data storage. However, standard PyTorch data loaders cannot keep modern GPUs utilized when streaming \geotiff files directly from cloud storage. In this work, we benchmark \geotiff loading throughput from both cloud object storage and local SSD, systematically testing different loader configurations and data parameters. We focus on tile-aligned reads and worker thread pools, using Bayesian optimization to find optimal settings for each storage type. Our optimized configurations increase remote data loading throughput by 20$\times$ and local throughput by 4$\times$ compared to default settings. On three public EO benchmarks, models trained with optimized remote loading achieve the same accuracy as local training within identical time budgets. We improve validation IoU by $6$–$15$% and maintain $85$–$95$% GPU utilization versus $0$–$30$% with standard configurations. Code is publicly available at \url{https://github.com/microsoft/pytorch-cloud-geotiff-optimization}.

Cite this Paper


BibTeX
@InProceedings{pmlr-v292-zaytar25a, title = {Optimizing Cloud-to-{GPU} Throughput for Deep Learning With Earth Observation Data}, author = {Zaytar, Akram and Robinson, Caleb and Tadesse, Girmaw Abebe and Glazer, Tammy and HACHEME, Gilles and Ortiz, Anthony and Dodhia, Rahul M and Lavista Ferres, Juan M}, booktitle = {Proceedings of The TerraBytes {ICML} Workshop: Towards global datasets and models for Earth Observation}, pages = {97--110}, year = {2025}, editor = {Audebert, Nicolas and Azizpour, Hossein and Barrière, Valentin and Castillo Navarro, Javiera and Czerkawski, Mikolaj and Fang, Heng and Francis, Alistair and Marsocci, Valerio and Nascetti, Andrea and Yadav, Ritu}, volume = {292}, series = {Proceedings of Machine Learning Research}, month = {19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v292/main/assets/zaytar25a/zaytar25a.pdf}, url = {https://proceedings.mlr.press/v292/zaytar25a.html}, abstract = {Training deep learning models on petabyte-scale Earth Observation (EO) data requires separating compute resources from data storage. However, standard PyTorch data loaders cannot keep modern GPUs utilized when streaming \geotiff files directly from cloud storage. In this work, we benchmark \geotiff loading throughput from both cloud object storage and local SSD, systematically testing different loader configurations and data parameters. We focus on tile-aligned reads and worker thread pools, using Bayesian optimization to find optimal settings for each storage type. Our optimized configurations increase remote data loading throughput by 20$\times$ and local throughput by 4$\times$ compared to default settings. On three public EO benchmarks, models trained with optimized remote loading achieve the same accuracy as local training within identical time budgets. We improve validation IoU by $6$–$15$% and maintain $85$–$95$% GPU utilization versus $0$–$30$% with standard configurations. Code is publicly available at \url{https://github.com/microsoft/pytorch-cloud-geotiff-optimization}.} }
Endnote
%0 Conference Paper %T Optimizing Cloud-to-GPU Throughput for Deep Learning With Earth Observation Data %A Akram Zaytar %A Caleb Robinson %A Girmaw Abebe Tadesse %A Tammy Glazer %A Gilles HACHEME %A Anthony Ortiz %A Rahul M Dodhia %A Juan M Lavista Ferres %B Proceedings of The TerraBytes {ICML} Workshop: Towards global datasets and models for Earth Observation %C Proceedings of Machine Learning Research %D 2025 %E Nicolas Audebert %E Hossein Azizpour %E Valentin Barrière %E Javiera Castillo Navarro %E Mikolaj Czerkawski %E Heng Fang %E Alistair Francis %E Valerio Marsocci %E Andrea Nascetti %E Ritu Yadav %F pmlr-v292-zaytar25a %I PMLR %P 97--110 %U https://proceedings.mlr.press/v292/zaytar25a.html %V 292 %X Training deep learning models on petabyte-scale Earth Observation (EO) data requires separating compute resources from data storage. However, standard PyTorch data loaders cannot keep modern GPUs utilized when streaming \geotiff files directly from cloud storage. In this work, we benchmark \geotiff loading throughput from both cloud object storage and local SSD, systematically testing different loader configurations and data parameters. We focus on tile-aligned reads and worker thread pools, using Bayesian optimization to find optimal settings for each storage type. Our optimized configurations increase remote data loading throughput by 20$\times$ and local throughput by 4$\times$ compared to default settings. On three public EO benchmarks, models trained with optimized remote loading achieve the same accuracy as local training within identical time budgets. We improve validation IoU by $6$–$15$% and maintain $85$–$95$% GPU utilization versus $0$–$30$% with standard configurations. Code is publicly available at \url{https://github.com/microsoft/pytorch-cloud-geotiff-optimization}.
APA
Zaytar, A., Robinson, C., Tadesse, G.A., Glazer, T., HACHEME, G., Ortiz, A., Dodhia, R.M. & Lavista Ferres, J.M.. (2025). Optimizing Cloud-to-GPU Throughput for Deep Learning With Earth Observation Data. Proceedings of The TerraBytes {ICML} Workshop: Towards global datasets and models for Earth Observation, in Proceedings of Machine Learning Research 292:97-110 Available from https://proceedings.mlr.press/v292/zaytar25a.html.

Related Material