Pion: 직교 등가 변환을 통한 스펙트럼 보존 최적화기

초록

우리는 대규모 언어 모델(LLM) 훈련을 위한 스펙트럼 보존 최적화 도구인 Pion을 소개한다. 이는 직교 등가 변환에 기반한다. Adam이나 Muon과 같은 부가적 최적화기와 달리, Pion은 각 가중치 행렬을 좌우 직교 변환을 통해 업데이트하여 훈련 과정 전반에 걸쳐 특이값을 보존한다. 이를 통해 가중치 행렬의 기하학적 구조를 조정하면서도 스펙트럼 노름을 고정하는 최적화 메커니즘이 구현된다. 우리는 Pion 업데이트 규칙을 유도하고, 설계 선택 사항을 체계적으로 검토하며, 주요 특성들과 함께 수렴 거동을 분석한다. 실험 결과는 Pion이 LLM 사전 훈련과 미세 조정 모두에서 표준 최적화기에 대한 안정적이고 경쟁력 있는 대안을 제공함을 보여준다.

English

We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout training. This yields an optimization mechanism that modulates the geometry of weight matrices while keeping their spectral norm fixed. We derive the Pion update rule, systematically examine its design choices, and analyze its convergence behavior along with several key properties. Empirical results show that Pion offers a stable and competitive alternative to standard optimizers for both LLM pretraining and finetuning.

Pion: 직교 등가 변환을 통한 스펙트럼 보존 최적화기

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

초록

Support