AERIS:阿贡地球系统模型,实现可靠且精准的预测
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions
September 16, 2025
作者: Väinö Hatanpää, Eugene Ku, Jason Stock, Murali Emani, Sam Foreman, Chunyong Jung, Sandeep Madireddy, Tung Nguyen, Varuni Sastry, Ray A. O. Sinurat, Sam Wheeler, Huihuo Zheng, Troy Arcomano, Venkatram Vishwanath, Rao Kotamarthi
cs.AI
摘要
生成式机器学习为深入理解复杂地球系统动力学提供了新的机遇。近期基于扩散模型的方法在天气预测中解决了光谱偏差问题,并相较于确定性方法提升了集合校准效果,然而这些方法在高分辨率下的稳定扩展仍面临挑战。我们提出了AERIS,一个参数规模从1.3B到80B的像素级Swin扩散变换器,以填补这一空白;同时,SWiPe作为一种通用技术,通过将窗口并行与序列及管道并行相结合,实现了基于窗口的变换器分片,无需额外通信成本或增加全局批量大小。在Aurora(10,080个节点)上,AERIS在0.25° ERA5数据集上以1×1的补丁尺寸维持了10.21 ExaFLOPS(混合精度)的运算速度,峰值性能达到11.21 ExaFLOPS,弱扩展效率为95.5%,强扩展效率为81.6%。AERIS在性能上超越了IFS ENS,并在长达90天的季节尺度上保持稳定,彰显了十亿参数扩散模型在天气与气候预测领域的巨大潜力。
English
Generative machine learning offers new opportunities to better understand
complex Earth system dynamics. Recent diffusion-based methods address spectral
biases and improve ensemble calibration in weather forecasting compared to
deterministic methods, yet have so far proven difficult to scale stably at high
resolutions. We introduce AERIS, a 1.3 to 80B parameter pixel-level Swin
diffusion transformer to address this gap, and SWiPe, a generalizable technique
that composes window parallelism with sequence and pipeline parallelism to
shard window-based transformers without added communication cost or increased
global batch size. On Aurora (10,080 nodes), AERIS sustains 10.21 ExaFLOPS
(mixed precision) and a peak performance of 11.21 ExaFLOPS with 1 times 1
patch size on the 0.25{\deg} ERA5 dataset, achieving 95.5% weak scaling
efficiency, and 81.6% strong scaling efficiency. AERIS outperforms the IFS ENS
and remains stable on seasonal scales to 90 days, highlighting the potential of
billion-parameter diffusion models for weather and climate prediction.