弱驅動學習:弱智慧體如何強化強智慧體
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
February 9, 2026
作者: Zehao Chen, Gongxun Li, Tianxiang Ai, Yifei Li, Zixuan Huang, Wang Zhou, Fuzhen Zhuang, Xianglong Liu, Jianxin Li, Deqing Wang, Yikun Ban
cs.AI
摘要
隨著後訓練優化成為改進大型語言模型的關鍵手段,我們觀察到一個持續存在的飽和瓶頸:當模型達到高度置信後,進一步訓練產生的效益會遞減。儘管現有方法持續強化目標預測,我們發現具信息量的監督信號仍潛藏於模型自身的歷史弱狀態中。基於此觀察,我們提出WMSS(弱智能體可使強智能體更強)這一後訓練範式,利用弱檢查點來引導持續優化。通過熵動態識別可恢復的學習差距,並藉由補償性學習進行強化,WMSS能讓強智能體突破傳統後訓練的飽和限制。在數學推理與程式碼生成數據集上的實驗表明,採用本方法訓練的智能體可實現有效的性能提升,且無需增加任何推論成本。
English
As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.