強化学習による大規模言語モデル内の小型サブネットワークの微調整

要旨

強化学習（RL）は、大規模言語モデル（LLMs）の下流タスク性能と人間の価値観との整合性において大幅な改善をもたらす。驚くべきことに、このような大きな向上は、パラメータのわずか5％から30％を含む小さなサブネットワークのみを更新することで達成され、残りの部分は実質的に変更されない。我々はこの現象をRLによって誘発されるパラメータ更新のスパース性と呼ぶ。このスパース性は、我々の実験で使用した7つの広く使われているRLアルゴリズム（例：PPO、GRPO、DPO）と、異なるファミリーに属する10のLLMsすべてにおいて観察された。このスパース性は本質的であり、明示的なスパース性を促進する正則化やアーキテクチャ上の制約なしに発生する。サブネットワークのみをファインチューニングすることで、テスト精度が回復し、驚くべきことに、完全なファインチューニングによって得られるモデルとほぼ同一のモデルが生成される。異なるランダムシード、トレーニングデータ、さらにはRLアルゴリズムから得られたサブネットワークは、偶然に期待されるよりも大幅に重複している。我々の分析によれば、このスパース性は特定の層のみを更新することによるものではなく、ほぼすべてのパラメータ行列が同様にスパースな更新を受けることが示唆されている。さらに、ほぼすべてのパラメータ行列に対する更新はほぼフルランクであり、RLがパラメータ行列が表現できるほぼ完全な部分空間をカバーする小さなパラメータのサブセットを更新していることが示唆される。我々は、この更新のスパース性は主にポリシー分布に近いデータでトレーニングすることに起因すると推測しており、KL正則化や勾配クリッピングなどの、ポリシーを事前学習モデルに近い状態に保つことを促す技術の影響は限定的であると考えられる。

English

Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from updating only a small subnetwork comprising just 5 percent to 30 percent of the parameters, with the rest effectively unchanged. We refer to this phenomenon as parameter update sparsity induced by RL. It is observed across all 7 widely used RL algorithms (e.g., PPO, GRPO, DPO) and all 10 LLMs from different families in our experiments. This sparsity is intrinsic and occurs without any explicit sparsity promoting regularizations or architectural constraints. Finetuning the subnetwork alone recovers the test accuracy, and, remarkably, produces a model nearly identical to the one obtained via full finetuning. The subnetworks from different random seeds, training data, and even RL algorithms show substantially greater overlap than expected by chance. Our analysis suggests that this sparsity is not due to updating only a subset of layers, instead, nearly all parameter matrices receive similarly sparse updates. Moreover, the updates to almost all parameter matrices are nearly full-rank, suggesting RL updates a small subset of parameters that nevertheless span almost the full subspaces that the parameter matrices can represent. We conjecture that the this update sparsity can be primarily attributed to training on data that is near the policy distribution, techniques that encourage the policy to remain close to the pretrained model, such as the KL regularization and gradient clipping, have limited impact.

強化学習による大規模言語モデル内の小型サブネットワークの微調整

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

要旨

Support