ChatPaper.aiChatPaper

基于分布式GPU的大规模语言模型预训练:一种内存高效的去中心化范式

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

February 12, 2026
作者: Jinrui Zhang, Chaodong Xiao, Aoqi Wu, Xindong Zhang, Lei Zhang
cs.AI

摘要

大规模语言模型(LLM)的预训练通常需要配备数千块高内存GPU(如H100/A100)的集中式算力集群。近期出现的去中心化训练方法通过采用联邦优化降低了通信开销,但仍需在每个节点上训练完整模型,受限于GPU内存瓶颈。本文提出SPES(稀疏专家同步)——一种面向专家混合模型(MoE)LLM预训练的内存高效去中心化框架。SPES在每个节点上仅训练部分专家,显著降低内存占用。各节点更新本地专家参数并定期与其他节点同步,在避免全参数传输的同时实现高效知识共享。为加速收敛,我们引入专家融合预热策略:在训练初期通过专家间知识交换快速建立基础能力。基于SPES框架,我们使用16张独立48GB GPU通过互联网连接成功训练了20亿参数MoE LLM,其性能在同等计算预算下可与集中式训练的LLM相媲美。我们进一步验证了该框架的可扩展性:从头训练的70亿参数模型以及从稠密检查点升级的90亿参数模型,均达到了现有集中式基线的性能水平。代码已开源:https://github.com/zjr2000/SPES。
English
Pretraining large language models (LLMs) typically requires centralized clusters with thousands of high-memory GPUs (e.g., H100/A100). Recent decentralized training methods reduce communication overhead by employing federated optimization; however, they still need to train the entire model on each node, remaining constrained by GPU memory limitations. In this work, we propose SParse Expert Synchronization (SPES), a memory-efficient decentralized framework for pretraining mixture-of-experts (MoE) LLMs. SPES trains only a subset of experts per node, substantially lowering the memory footprint. Each node updates its local experts and periodically synchronizes with other nodes, eliminating full-parameter transmission while ensuring efficient knowledge sharing. To accelerate convergence, we introduce an expert-merging warm-up strategy, where experts exchange knowledge early in training, to rapidly establish foundational capabilities. With SPES, we train a 2B-parameter MoE LLM using 16 standalone 48GB GPUs over internet connections, which achieves competitive performance with centrally trained LLMs under similar computational budgets. We further demonstrate scalability by training a 7B model from scratch and a 9B model upcycled from a dense checkpoint, both of which match prior centralized baselines. Our code is available at https://github.com/zjr2000/SPES.
PDF21February 14, 2026