P1：运用强化学习攻克物理奥林匹克竞赛

摘要

近期，大型语言模型（LLMs）的进展已将其前沿从解决谜题推进至科学级推理——这种推理能力旨在应对那些答案必须经得起自然检验，而不仅仅是符合评分标准的问题。物理学是这一转变最为严苛的测试场，它以根本的方式将符号与现实紧密相连，成为多数现代技术的基石。在本研究中，我们通过开发具备卓越物理推理能力的大型语言模型，成功推动了物理学研究，特别是在解决奥林匹克级别的物理问题上表现突出。我们推出了P1系列，这是一组完全通过强化学习（RL）训练的开源物理推理模型。其中，P1-235B-A22B是首个在最新国际物理奥林匹克竞赛（IPhO 2025）中达到金牌表现的开源模型，并在2024/2025年间的13项国际/地区物理竞赛中斩获12枚金牌。P1-30B-A3B同样在IPhO 2025上超越了几乎所有其他开源模型，获得银牌。进一步配备代理框架PhysicsMinions后，P1-235B-A22B+PhysicsMinions在IPhO 2025上综合排名第一，并在13项物理竞赛中取得最高平均分。除物理外，P1系列模型在数学、编程等其他推理任务上也展现出优异性能，彰显了P1系列强大的泛化能力。

English

Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to advance physics research by developing large language models with exceptional physics reasoning capabilities, especially excel at solving Olympiad-level physics problems. We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal. Further equipped with an agentic framework PhysicsMinions, P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions. Besides physics, P1 models also present great performance on other reasoning tasks like math and coding, showing the great generalibility of P1 series.