根優化器：神經網絡訓練的魯棒正交化優化算法

摘要

大型語言模型（LLM）的最佳化仍是關鍵挑戰，尤其在模型規模擴大使訓練過程對演算法誤差與不穩定性的敏感度加劇之際。近期最佳化器的進展雖透過動量正交化提升了收斂效率，卻存在兩大穩健性缺陷：正交化精準度的維度脆弱性，以及對異常值引發噪聲的易損性。為解決這些穩健性難題，我們提出ROOT（穩健正交化最佳化器），透過雙重穩健機制增強訓練穩定性。首先，我們開發維度穩健的正交化方案，採用自適應牛頓迭代法與針對特定矩陣尺寸設計的細粒度係數，確保在不同架構配置下均能維持一致精準度。其次，我們透過近端最佳化建立最佳化穩健框架，在抑制異常值噪聲的同時保留有意義的梯度方向。大量實驗表明，相較於Muon與Adam系列最佳化器，ROOT在噪聲環境與非凸情境中顯著提升穩健性，並實現更快的收斂速度與更優的最終效能。本研究為開發能應對現代大規模模型訓練複雜性的穩健精準最佳化器確立了新範式。程式碼將公開於：https://github.com/huawei-noah/noah-research/tree/master/ROOT。

English

The optimization of large language models (LLMs) remains a critical challenge, particularly as model scaling exacerbates sensitivity to algorithmic imprecision and training instability. Recent advances in optimizers have improved convergence efficiency through momentum orthogonalization, but suffer from two key robustness limitations: dimensional fragility in orthogonalization precision and vulnerability to outlier-induced noise. To address these robustness challenges, we introduce ROOT, a Robust Orthogonalized Optimizer that enhances training stability through dual robustness mechanisms. First, we develop a dimension-robust orthogonalization scheme using adaptive Newton iterations with fine-grained coefficients tailored to specific matrix sizes, ensuring consistent precision across diverse architectural configurations. Second, we introduce an optimization-robust framework via proximal optimization that suppresses outlier noise while preserving meaningful gradient directions. Extensive experiments demonstrate that ROOT achieves significantly improved robustness, with faster convergence and superior final performance compared to both Muon and Adam-based optimizers, particularly in noisy and non-convex scenarios. Our work establishes a new paradigm for developing robust and precise optimizers capable of handling the complexities of modern large-scale model training. The code will be available at https://github.com/huawei-noah/noah-research/tree/master/ROOT.

根優化器：神經網絡訓練的魯棒正交化優化算法

ROOT: Robust Orthogonalized Optimizer for Neural Network Training

摘要

Support