직교 등가 변환을 통한 재매개변수화된 LLM 학습

초록

대규모 언어 모델(LLM)이 인공지능의 급속한 발전을 주도하고 있지만, 이러한 대형 모델을 효과적이고 안정적으로 학습시키는 것은 여전히 이 분야의 가장 중요한 과제 중 하나입니다. 이 문제를 해결하기 위해, 우리는 직교 등가 변환(Orthogonal Equivalence Transformation)을 사용하여 뉴런을 최적화하는 새로운 재매개변수화 학습 알고리즘인 POET를 제안합니다. 구체적으로, POET는 각 뉴런을 두 개의 학습 가능한 직교 행렬과 고정된 랜덤 가중치 행렬로 재매개변수화합니다. POET는 가중치 행렬의 스펙트럼 특성을 보존할 수 있음이 증명되어, 개선된 일반화 성능으로 목적 함수를 안정적으로 최적화할 수 있습니다. 또한, 우리는 POET가 대규모 신경망 학습에 유연하고 확장 가능하도록 효율적인 근사 방법을 개발했습니다. 광범위한 실험을 통해 POET가 LLM 학습에서 효과적이고 확장 가능함을 검증했습니다.

English

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

직교 등가 변환을 통한 재매개변수화된 LLM 학습

Reparameterized LLM Training via Orthogonal Equivalence Transformation

초록

Support