Transformer Copilot: LLMファインチューニングにおけるミスログからの学習

要旨

大規模言語モデルは通常、ドメイン固有のデータに対する教師ありファインチューニングを通じて下流タスクに適応されます。標準的なファインチューニングでは、生成損失を最小化してモデルパラメータを最適化することに焦点を当てますが、我々はさらに一歩進んで、モデル自身の学習信号を保持し活用するアプローチを取ります。これは、人間の学習者が過去のミスを振り返り、将来のパフォーマンスを向上させる方法に似ています。まず、ファインチューニング全体を通じてモデルの学習行動と繰り返し発生するエラーを体系的に追跡するための「ミスログ」の概念を導入します。元のTransformerベースのモデルを「パイロット」として扱い、それに対応して「コパイロット」モデルを設計し、ロジットの補正を通じてパイロットの推論性能を向上させます。この全体のパイロット-コパイロットフレームワークを「Transformer Copilot」と名付け、以下を導入します：(i) 新しいコパイロットモデルの設計、(ii) コパイロットが進化するミスログから継続的に学習する共同トレーニングパラダイム、(iii) コパイロットがパイロットのロジットを補正して生成を強化する融合推論パラダイム。我々はこの新しい学習フレームワークについて理論的および実証的な分析を提供します。常識、算術、推薦タスクにわたる12のベンチマークでの実験により、Transformer Copilotが最大34.5%の性能向上を一貫して達成し、パイロットモデルへの計算オーバーヘッドを最小限に抑えつつ、強力なスケーラビリティと転移性を示すことが実証されました。

English

Large language models are typically adapted to downstream tasks through supervised fine-tuning on domain-specific data. While standard fine-tuning focuses on minimizing generation loss to optimize model parameters, we take a deeper step by retaining and leveraging the model's own learning signals, analogous to how human learners reflect on past mistakes to improve future performance. We first introduce the concept of Mistake Log to systematically track the model's learning behavior and recurring errors throughout fine-tuning. Treating the original transformer-based model as the Pilot, we correspondingly design a Copilot model to refine the Pilot's inference performance via logits rectification. We name the overall Pilot-Copilot framework the Transformer Copilot, which introduces (i) a novel Copilot model design, (ii) a joint training paradigm where the Copilot continuously learns from the evolving Mistake Log alongside the Pilot, and (iii) a fused inference paradigm where the Copilot rectifies the Pilot's logits for enhanced generation. We provide both theoretical and empirical analyses on our new learning framework. Experiments on 12 benchmarks spanning commonsense, arithmetic, and recommendation tasks demonstrate that Transformer Copilot consistently improves performance by up to 34.5%, while introducing marginal computational overhead to Pilot models and exhibiting strong scalability and transferability.

Transformer Copilot: LLMファインチューニングにおけるミスログからの学習

Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

要旨

Support