ChatPaper.aiChatPaper

Numina-Lean-Agent:面向形式化数学的开放通用智能推理系统

Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics

January 20, 2026
作者: Junqi Liu, Zihao Zhou, Zekai Zhu, Marco Dos Santos, Weikun He, Jiawei Liu, Ran Wang, Yunzhou Xie, Junqiao Zhao, Qiufeng Wang, Lihong Zhi, Jia Li, Wenda Li
cs.AI

摘要

近期,智能体系统已成为形式化定理证明的主流范式,通过协调多个模型与工具实现了强劲性能。然而现有方法常依赖特定任务流水线和训练过的形式证明器,限制了灵活性与可复现性。本文提出直接使用通用编程智能体作为形式数学推理器的新范式,其优势在于:(1)通用编程智能体为证明之外的多样化推理任务提供自然接口;(2)仅需替换底层基础模型即可提升性能,无需训练;(3)MCP框架支持灵活扩展并自主调用专用工具,避免复杂设计。基于此范式,我们推出Numina-Lean-Agent,融合Claude Code与Numina-Lean-MCP以实现与Lean的自主交互、相关定理检索、非形式化证明及辅助推理工具调用。以Claude Opus 4.5为基础模型时,Numina-Lean-Agent在Putnam 2025全部12道题目中实现满分解答(12/12),媲美最佳闭源系统。除基准评估外,我们还通过协助数学家成功形式化Brascamp-Lieb定理,进一步验证其泛化能力。Numina-Lean-Agent及全部解题代码已发布于https://github.com/project-numina/numina-lean-agent。
English
Agentic systems have recently become the dominant paradigm for formal theorem proving, achieving strong performance by coordinating multiple models and tools. However, existing approaches often rely on task-specific pipelines and trained formal provers, limiting their flexibility and reproducibility. In this paper, we propose the paradigm that directly uses a general coding agent as a formal math reasoner. This paradigm is motivated by (1) A general coding agent provides a natural interface for diverse reasoning tasks beyond proving, (2) Performance can be improved by simply replacing the underlying base model, without training, and (3) MCP enables flexible extension and autonomous calling of specialized tools, avoiding complex design. Based on this paradigm, we introduce Numina-Lean-Agent, which combines Claude Code with Numina-Lean-MCP to enable autonomous interaction with Lean, retrieval of relevant theorems, informal proving and auxiliary reasoning tools. Using Claude Opus 4.5 as the base model, Numina-Lean-Agent solves all problems in Putnam 2025 (12 / 12), matching the best closed-source system. Beyond benchmark evaluation, we further demonstrate its generality by interacting with mathematicians to successfully formalize the Brascamp-Lieb theorem. We release Numina-Lean-Agent and all solutions at https://github.com/project-numina/numina-lean-agent.
PDF91January 23, 2026