单次熵最小化
One-shot Entropy Minimization
May 26, 2025
作者: Zitian Gao, Lynx Chen, Joey Zhou, Bryan Dai
cs.AI
摘要
我们训练了13,440个大型语言模型,发现熵最小化仅需单个未标注数据和10步优化,就能达到甚至超越基于规则的强化学习中使用数千数据和精心设计奖励所获得的性能提升。这一惊人发现可能促使我们重新思考大型语言模型的训练后优化范式。我们的代码已发布于https://github.com/zitian-gao/one-shot-em。
English
We trained 13,440 large language models and found that entropy minimization
requires only a single unlabeled data and 10 steps optimization to achieve
performance improvements comparable to or even greater than those obtained
using thousands of data and carefully designed rewards in rule-based
reinforcement learning. This striking result may prompt a rethinking of
post-training paradigms for large language models. Our code is avaliable at
https://github.com/zitian-gao/one-shot-em.Summary
AI-Generated Summary