基于价值预训练与下游反馈
Value-Based Pre-Training with Downstream Feedback
January 29, 2026
作者: Shuqi Ke, Giulia Fanti
cs.AI
摘要
少量经过验证的目标信息能否引导基础模型昂贵的自监督预训练?传统预训练方法优化的是固定代理目标(如下一词预测),这种机制可能导致计算资源偏离下游任务的核心能力。我们提出价值预训练法(V-Pretraining):一种基于价值、与模态无关的受控持续预训练方法,通过轻量级任务设计器重塑预训练任务,使每个梯度步的价值最大化。以样本增强的自监督学习为例,该任务设计器会筛选预训练任务(如数据增强方案),确保预训练损失梯度与下游任务(如图像分割)计算的梯度方向一致。这种方法能有效引导预训练过程朝向相关下游能力发展。值得注意的是,预训练模型始终不接触下游任务标签,这些标签仅用于塑造预训练任务。在相同更新预算下,对0.5B-7B语言模型进行价值预训练时,仅需使用12%的GSM8K训练样本作为反馈,就能在推理任务(GSM8K测试Pass@1)上相较传统下一词预测方法实现最高18%的相对提升。在视觉自监督学习中,我们将ADE20K数据集的最优结果提升1.07 mIoU,在降低NYUv2 RMSE的同时提升ImageNet线性分类准确率,并为持续预训练中的令牌效率提升提供了初步证据。
English
Can a small amount of verified goal information steer the expensive self-supervised pretraining of foundation models? Standard pretraining optimizes a fixed proxy objective (e.g., next-token prediction), which can misallocate compute away from downstream capabilities of interest. We introduce V-Pretraining: a value-based, modality-agnostic method for controlled continued pretraining in which a lightweight task designer reshapes the pretraining task to maximize the value of each gradient step. For example, consider self-supervised learning (SSL) with sample augmentation. The V-Pretraining task designer selects pretraining tasks (e.g., augmentations) for which the pretraining loss gradient is aligned with a gradient computed over a downstream task (e.g., image segmentation). This helps steer pretraining towards relevant downstream capabilities. Notably, the pretrained model is never updated on downstream task labels; they are used only to shape the pretraining task. Under matched learner update budgets, V-Pretraining of 0.5B--7B language models improves reasoning (GSM8K test Pass@1) by up to 18% relative over standard next-token prediction using only 12% of GSM8K training examples as feedback. In vision SSL, we improve the state-of-the-art results on ADE20K by up to 1.07 mIoU and reduce NYUv2 RMSE while improving ImageNet linear accuracy, and we provide pilot evidence of improved token efficiency in continued pretraining.