直交等価変換による再パラメータ化LLMトレーニング

要旨

大規模言語モデル（LLMs）は人工知能の急速な進歩を牽引しているが、これらの大規模モデルを効果的かつ信頼性高く訓練することは、依然としてこの分野における最も重要な課題の一つである。この課題に対処するため、我々はPOETという新しい再パラメータ化訓練アルゴリズムを提案する。POETは、直交等価変換を用いてニューロンを最適化する。具体的には、POETは各ニューロンを2つの学習可能な直交行列と1つの固定されたランダム重み行列で再パラメータ化する。重み行列のスペクトル特性を保証するため、POETは目的関数を安定して最適化し、汎化性能を向上させることができる。さらに、POETを大規模ニューラルネットワークの訓練に柔軟かつスケーラブルにするための効率的な近似手法を開発した。広範な実験により、POETの有効性とスケーラビリティがLLMsの訓練において検証された。

English

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

直交等価変換による再パラメータ化LLMトレーニング

Reparameterized LLM Training via Orthogonal Equivalence Transformation

要旨

Support