P1: 強化学習を用いた物理オリンピックのマスター

要旨

大規模言語モデル（LLMs）の最近の進展は、パズル解決から科学的推論のフロンティアへと移行し、その答えが単なる採点基準に適合するだけでなく、自然の法則に耐えうるような問題に取り組む能力を求められるようになった。物理学はこのシフトにおいて最も厳密なテストケースであり、記号と現実を根本的に結びつけることで、現代技術の基盤としての役割を果たしている。本研究では、特に物理オリンピックレベルの問題解決に優れた物理推論能力を持つ大規模言語モデルを開発し、物理学研究の進展に貢献する。我々は、強化学習（RL）を通じて完全に訓練されたオープンソースの物理推論モデルファミリーであるP1を紹介する。その中でも、P1-235B-A22Bは、最新の国際物理オリンピック（IPhO 2025）において金メダルレベルの性能を発揮する初のオープンソースモデルであり、2024/2025年の13の国際/地域物理コンテストのうち12個の金メダルを獲得した。また、P1-30B-A3BもIPhO 2025において他のほとんどのオープンソースモデルを上回り、銀メダルを獲得した。さらに、エージェントフレームワークであるPhysicsMinionsを搭載したP1-235B-A22B+PhysicsMinionsは、IPhO 2025で総合1位を達成し、13の物理コンテストにおける最高平均スコアを記録した。物理学以外の分野においても、P1モデルは数学やコーディングなどの推論タスクで優れた性能を示し、P1シリーズの高い汎用性を証明している。

English

Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to advance physics research by developing large language models with exceptional physics reasoning capabilities, especially excel at solving Olympiad-level physics problems. We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal. Further equipped with an agentic framework PhysicsMinions, P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions. Besides physics, P1 models also present great performance on other reasoning tasks like math and coding, showing the great generalibility of P1 series.

P1: 強化学習を用いた物理オリンピックのマスター

P1: Mastering Physics Olympiads with Reinforcement Learning

要旨

Support