ChatPaper.aiChatPaper

SPARKLING:宽度渐进式学习中信号保持与对称性破缺的平衡策略

SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning

February 2, 2026
作者: Qifan Yu, Xinyu Ma, Zhijian Zhuo, Minrui Wang, Deyi Liu, Shiyi Zhan, Yiyuan Ma, Liang Xiang, Xingyan Bin, Di He
cs.AI

摘要

渐进式学习(PL)通过逐步扩大模型规模来降低预训练计算开销。虽然前人研究已深入探索了深度扩展,但宽度扩展的研究仍明显不足,现有少数方法也仅限于训练早期阶段。然而,在训练中期进行宽度扩展对最大化计算效率节省至关重要,但由于严重的训练不稳定性,这仍是艰巨挑战。实验表明,该阶段简单的参数初始化会破坏激活值统计特性引发损失值尖峰,而基于复制的初始化方法又会因梯度对称性阻碍特征多样性。为解决这些问题,我们提出SPARKLING框架(通过平衡信号保持与对称性破缺实现宽度渐进学习),实现了中期宽度扩展的创新方案。该方法通过RMS尺度一致性保持信号稳定,确保扩展过程中激活统计特性平稳;采用非对称优化器状态重置与学习率重新预热机制实现对称性破缺。在混合专家模型上的大量实验表明,SPARKLING在多种宽度维度和优化器家族中均优于从头训练方法,在2倍宽度扩展下最高可降低35%训练成本。
English
Progressive Learning (PL) reduces pre-training computational overhead by gradually increasing model scale. While prior work has extensively explored depth expansion, width expansion remains significantly understudied, with the few existing methods limited to the early stages of training. However, expanding width during the mid-stage is essential for maximizing computational savings, yet it remains a formidable challenge due to severe training instabilities. Empirically, we show that naive initialization at this stage disrupts activation statistics, triggering loss spikes, while copy-based initialization introduces gradient symmetry that hinders feature diversity. To address these issues, we propose SPARKLING (balancing {S}ignal {P}reservation {A}nd symmet{R}y brea{K}ing for width-progressive {L}earn{ING}), a novel framework for mid-stage width expansion. Our method achieves signal preservation via RMS-scale consistency, stabilizing activation statistics during expansion. Symmetry breaking is ensured through asymmetric optimizer state resetting and learning rate re-warmup. Extensive experiments on Mixture-of-Experts (MoE) models demonstrate that, across multiple width axes and optimizer families, SPARKLING consistently outperforms training from scratch and reduces training cost by up to 35% under 2times width expansion.
PDF442March 12, 2026