ChatPaper.aiChatPaper

MaxProof: 利用生成-验证强化学习与群体级测试时扩展来扩展数学证明

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

June 11, 2026
作者: Jiacheng Chen, Xinyu Zhang, Shunkai Zhang, Yanmohan Wang, Lin Li, Tiancheng Qin, Qin Wang, Zhengmao Zhu, Tianle Li, Jingyang Li, Zehan Li, Binyang Jiang, Jin Zhu, Han Ding, Fei Yu, Chenyu Du, Zijian Song, Jiayuan Song, Zhi Zhang, Yunan Huang, Weiyu Cheng, Pengyu Zhao, Yu Cheng
cs.AI

摘要

我们提出了MaxProof,这是一个面向竞赛级数学证明的群体级测试时扩展框架,应用于MiniMax-M3系列。M3首先训练了三种面向证明的能力——证明生成、证明验证,以及基于评判条件的证明修复——采用一种为低误报率设计的纵深防御生成式验证器。这些能力被整合到单个发布的M3模型中。在测试时,MaxProof将该模型视为生成器、验证器、修正器和排序器,对候选证明的群体进行搜索,并通过锦标赛选择返回一个最终证明。通过MaxProof测试时扩展,M3模型在IMO 2025上达到35/42分,在USAMO 2026上达到36/42分,均超过了人类金牌阈值。
English
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.