弱驱动学习:弱智能体如何强化强智能体
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
February 9, 2026
作者: Zehao Chen, Gongxun Li, Tianxiang Ai, Yifei Li, Zixuan Huang, Wang Zhou, Fuzhen Zhuang, Xianglong Liu, Jianxin Li, Deqing Wang, Yikun Ban
cs.AI
摘要
随着后训练优化成为提升大语言模型性能的核心手段,我们观察到一种持续存在的饱和瓶颈:当模型达到高度置信状态后,继续训练带来的收益逐渐递减。尽管现有方法持续强化目标预测,但我们发现模型自身历史弱状态中仍潜藏着有价值的监督信号。基于这一发现,我们提出WMSS(弱智能体可使强智能体更强)——一种利用弱检查点引导持续优化的后训练范式。该方法通过熵动态识别可修复的学习差距,并借助补偿性学习进行强化,使强智能体能够突破传统后训练的饱和限制。在数学推理和代码生成数据集上的实验表明,采用本方法训练的智能体实现了显著性能提升,且推理过程无需任何额外计算成本。
English
As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.