ChatPaper.aiChatPaper

AERIS:阿貢地球系統模型,實現可靠且精準的預測

AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions

September 16, 2025
作者: Väinö Hatanpää, Eugene Ku, Jason Stock, Murali Emani, Sam Foreman, Chunyong Jung, Sandeep Madireddy, Tung Nguyen, Varuni Sastry, Ray A. O. Sinurat, Sam Wheeler, Huihuo Zheng, Troy Arcomano, Venkatram Vishwanath, Rao Kotamarthi
cs.AI

摘要

生成式机器学习为深入理解复杂的地球系统动力学提供了新的契机。相较于确定性方法,近期基于扩散的方法在天气预测中有效应对了光谱偏差并提升了集合校准的精度,然而这些方法在高分辨率下的稳定扩展仍面临挑战。为此,我们提出了AERIS,一个参数规模从1.3亿至800亿的像素级Swin扩散变换器,以填补这一技术空白;同时,我们开发了SWiPe技术,这是一种可推广的方法,它将窗口并行性与序列及管道并行性相结合,实现了基于窗口的变换器的分片处理,且无需增加通信成本或扩大全局批量大小。在Aurora系统(10,080个节点)上,AERIS在0.25度ERA5数据集上以1×1的补丁尺寸,持续实现了10.21 ExaFLOPS(混合精度)的运算能力,峰值性能达到11.21 ExaFLOPS,弱扩展效率高达95.5%,强扩展效率为81.6%。AERIS的表现超越了IFS ENS,并在长达90天的季节尺度上保持稳定,充分展示了十亿参数扩散模型在天气与气候预测领域的巨大潜力。
English
Generative machine learning offers new opportunities to better understand complex Earth system dynamics. Recent diffusion-based methods address spectral biases and improve ensemble calibration in weather forecasting compared to deterministic methods, yet have so far proven difficult to scale stably at high resolutions. We introduce AERIS, a 1.3 to 80B parameter pixel-level Swin diffusion transformer to address this gap, and SWiPe, a generalizable technique that composes window parallelism with sequence and pipeline parallelism to shard window-based transformers without added communication cost or increased global batch size. On Aurora (10,080 nodes), AERIS sustains 10.21 ExaFLOPS (mixed precision) and a peak performance of 11.21 ExaFLOPS with 1 times 1 patch size on the 0.25{\deg} ERA5 dataset, achieving 95.5% weak scaling efficiency, and 81.6% strong scaling efficiency. AERIS outperforms the IFS ENS and remains stable on seasonal scales to 90 days, highlighting the potential of billion-parameter diffusion models for weather and climate prediction.
PDF61September 18, 2025