SkillFactory: 認知行動学習のための自己蒸留

要旨

長い思考連鎖を活用する推論モデルは、回答の検証、バックトラッキング、代替手法による再試行など、様々な認知スキルを駆使する。従来の研究では、ベース言語モデルがこれらのスキルを示す場合、強化学習（RL）による追加訓練によってそれらを効果的に活用できることが示されている。では、ベースモデルが持たないスキルをモデルに活用させるにはどうすればよいか？我々の研究であるSkillFactoryは、RLの前段階である教師ありファインチューニング（SFT）段階でこれらのスキルを概ね学習させる手法である。本手法はより強力なモデルからの知識蒸頼に依存せず、代わりにモデル自身が生成したサンプルを再構成し、それらのスキル形式に合わせた訓練データを提供する。これらの「シルバー」SFTトレースは不完全であっても、RL段階でスキルを獲得するための素地として有効である。評価結果から、(1) SkillFactoryのSFT初期化はRL前の性能は低いにも関わらず、RL後のタスク難易度が高いバリアントへの汎化を促進すること、(2) モデルが実際に認知スキルを使用していること、(3) SkillFactoryモデルはベースモデルよりも領域外タスクでの性能劣化に対して頑健であることが示された。本研究は、RL前に獲得された帰納的バイアスが、頑健な認知スキルの使用をモデルに学習させることを示唆している。

English

Reasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when a base language model exhibits these skills, training that model further with reinforcement learning (RL) can learn to leverage them. How can we get models to leverage skills that aren't exhibited by base models? Our work, SkillFactory, is a method for fine-tuning models to roughly learn these skills during a supervised fine-tuning (SFT) stage prior to RL. Our approach does not rely on distillation from a stronger model, but instead uses samples from the model itself, rearranged to provide training data in the format of those skills. These "silver" SFT traces may be imperfect, but are nevertheless effective for priming a model to acquire skills during RL. Our evaluation shows that (1) starting from SkillFactory SFT initialization helps a model to generalize to harder variants of a task post-RL, despite lower performance pre-RL; (2) cognitive skills are indeed used by the model; (3) RLed SkillFactory models are more robust to regression on out-of-domain tasks than RLed base models. Our work suggests that inductive biases learned prior to RL help models learn robust cognitive skill use.

SkillFactory: 認知行動学習のための自己蒸留

SkillFactory: Self-Distillation For Learning Cognitive Behaviors

要旨

Support