MaxProof: 利用生成-验证强化学习与群体级测试时扩展来扩展数学证明

摘要

我们提出了MaxProof，这是一个面向竞赛级数学证明的群体级测试时扩展框架，应用于MiniMax-M3系列。M3首先训练了三种面向证明的能力——证明生成、证明验证，以及基于评判条件的证明修复——采用一种为低误报率设计的纵深防御生成式验证器。这些能力被整合到单个发布的M3模型中。在测试时，MaxProof将该模型视为生成器、验证器、修正器和排序器，对候选证明的群体进行搜索，并通过锦标赛选择返回一个最终证明。通过MaxProof测试时扩展，M3模型在IMO 2025上达到35/42分，在USAMO 2026上达到36/42分，均超过了人类金牌阈值。

English

We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.