ChatPaper.aiChatPaper

基于价值预训练与下游反馈

Value-Based Pre-Training with Downstream Feedback

January 29, 2026
作者: Shuqi Ke, Giulia Fanti
cs.AI

摘要

少量经过验证的目标信息能否引导基础模型昂贵的自监督预训练?传统预训练方法优化的是固定代理目标(如下一词预测),这种机制可能导致计算资源分配与下游目标能力需求不匹配。我们提出基于价值的模态无关控制式持续预训练方法V-Pretraining:通过轻量级任务设计器动态调整预训练任务,使每个梯度步的价值最大化。以数据增强下的自监督学习为例,该任务设计器会选取那些预训练损失梯度与下游任务(如图像分割)梯度方向一致的预训练任务(如增强策略),从而将预训练导向相关下游能力。值得注意的是,预训练模型始终不接触下游任务标签,这些标签仅用于塑造预训练任务。在相同更新预算下,对0.5B-7B语言模型进行V-Pretraining,仅需使用12%的GSM8K训练样本作为反馈,即可在推理任务(GSM8K测试Pass@1)上相较标准下一词预测实现最高18%的相对提升。在视觉自监督学习中,我们将ADE20K数据集上的最优结果提升1.07 mIoU,在降低NYUv2 RMSE的同时保持ImageNet线性评估精度,并为持续预训练中的令牌效率提升提供了初步证据。
English
Can a small amount of verified goal information steer the expensive self-supervised pretraining of foundation models? Standard pretraining optimizes a fixed proxy objective (e.g., next-token prediction), which can misallocate compute away from downstream capabilities of interest. We introduce V-Pretraining: a value-based, modality-agnostic method for controlled continued pretraining in which a lightweight task designer reshapes the pretraining task to maximize the value of each gradient step. For example, consider self-supervised learning (SSL) with sample augmentation. The V-Pretraining task designer selects pretraining tasks (e.g., augmentations) for which the pretraining loss gradient is aligned with a gradient computed over a downstream task (e.g., image segmentation). This helps steer pretraining towards relevant downstream capabilities. Notably, the pretrained model is never updated on downstream task labels; they are used only to shape the pretraining task. Under matched learner update budgets, V-Pretraining of 0.5B--7B language models improves reasoning (GSM8K test Pass@1) by up to 18% relative over standard next-token prediction using only 12% of GSM8K training examples as feedback. In vision SSL, we improve the state-of-the-art results on ADE20K by up to 1.07 mIoU and reduce NYUv2 RMSE while improving ImageNet linear accuracy, and we provide pilot evidence of improved token efficiency in continued pretraining.
PDF12February 3, 2026