下流フィードバックを活用した価値ベース事前学習

要旨

少量の検証済み目標情報によって、高価な基盤モデルの自己教師あり事前学習を方向付けることは可能か？標準的な事前学習は固定の代理目的（例えば次トークン予測）を最適化するが、これは下流タスクで必要とされる能力から計算資源を誤って配分する可能性がある。本論文ではV-Pretrainingを提案する：軽量なタスク設計器が各勾配ステップの価値を最大化するように事前学習タスクを再形成する、価値ベースのモダリティ非依存の制御継続事前学習手法である。例えば、サンプル拡張を用いた自己教師あり学習（SSL）を考える。V-Pretrainingのタスク設計器は、事前学習損失勾配が下流タスク（例：画像セグメンテーション）で計算された勾配と整合する事前学習タスク（例：データ拡張手法）を選択する。これにより、関連する下流能力に向けた事前学習の方向付けが可能となる。特筆すべきは、事前学習モデルは下流タスクのラベルで更新されることはなく、それらは事前学習タスクの形成にのみ利用される点である。同等の学習更新予算条件下で、0.5B-7B規模の言語モデルに対するV-Pretrainingは、GSM8K訓練例の僅か12%をフィードバックとして用いるだけで、標準的な次トークン予測と比較して推論能力（GSM8KテストPass@1）を最大18%相対改善した。視覚SSLでは、ADE20Kにおける最新技術結果を最大1.07 mIoU改善し、NYUv2 RMSEを低減すると同時にImageNet線形精度を向上させ、継続事前学習におけるトークン効率改善の予備的証拠を提供する。

English

Can a small amount of verified goal information steer the expensive self-supervised pretraining of foundation models? Standard pretraining optimizes a fixed proxy objective (e.g., next-token prediction), which can misallocate compute away from downstream capabilities of interest. We introduce V-Pretraining: a value-based, modality-agnostic method for controlled continued pretraining in which a lightweight task designer reshapes the pretraining task to maximize the value of each gradient step. For example, consider self-supervised learning (SSL) with sample augmentation. The V-Pretraining task designer selects pretraining tasks (e.g., augmentations) for which the pretraining loss gradient is aligned with a gradient computed over a downstream task (e.g., image segmentation). This helps steer pretraining towards relevant downstream capabilities. Notably, the pretrained model is never updated on downstream task labels; they are used only to shape the pretraining task. Under matched learner update budgets, V-Pretraining of 0.5B--7B language models improves reasoning (GSM8K test Pass@1) by up to 18% relative over standard next-token prediction using only 12% of GSM8K training examples as feedback. In vision SSL, we improve the state-of-the-art results on ADE20K by up to 1.07 mIoU and reduce NYUv2 RMSE while improving ImageNet linear accuracy, and we provide pilot evidence of improved token efficiency in continued pretraining.

下流フィードバックを活用した価値ベース事前学習

Value-Based Pre-Training with Downstream Feedback

要旨

Support