测试时学会发现

摘要

如何利用人工智能为科学问题探索新的最优解？先前关于测试时扩展的研究（如AlphaEvolve）通过调用冻结的大语言模型进行搜索。我们则在测试时实施强化学习，使大语言模型能够持续训练，但此时训练内容专门针对待测问题。这种持续学习形式非常特殊，其目标是产生一个卓越解决方案而非多个平均表现良好的方案，并且要解决当前特定问题而非泛化至其他问题。因此，我们的学习目标和搜索子程序被设计为优先考虑最具潜力的解决方案。我们将这种方法称为"测试时训练探索法"。遵循先前研究，我们聚焦于具有连续奖励的问题。我们在数学、GPU内核工程、算法设计和生物学等领域尝试的所有问题上均报告结果。TTT探索法在几乎所有领域都创造了新的最优解纪录：（i）埃尔德什最小重叠问题与自相关不等式；（ii）GPUMode内核竞赛（比现有技术快达2倍）；（iii）往届AtCoder算法竞赛；（iv）单细胞分析中的去噪问题。我们的解决方案均经过专家或组织方审核。与之前需要封闭前沿模型的最佳结果不同，我们所有成果均通过开源模型OpenAI gpt-oss-120b实现，并可通过公开代码复现。测试时训练过程使用Thinking Machines公司的Tinker API完成，每个问题仅需数百美元成本。

English

How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology. TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to 2times faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis. Our solutions are reviewed by experts or the organizers. All our results are achieved with an open model, OpenAI gpt-oss-120b, and can be reproduced with our publicly available code, in contrast to previous best results that required closed frontier models. Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem.

测试时学会发现

Learning to Discover at Test Time

摘要

Support