ChatPaper.aiChatPaper

迈向自主数学研究

Towards Autonomous Mathematics Research

February 10, 2026
作者: Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung, Joonkyung Lee, Carlo Pagano, Sang-hyun Kim, Federico Pasqualotto, Sergei Gukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao, Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-tze Cheng, Demis Hassabis, Koray Kavukcuoglu, Quoc V. Le, Thang Luong
cs.AI

摘要

基础模型的近期进展已催生出能在国际数学奥林匹克竞赛中达到金牌标准的推理系统。然而,从竞赛级问题求解转向专业研究,需要驾驭海量文献并构建长程证明。本文提出Aletheia——一种能迭代生成、验证并修正自然语言端到端解决方案的数学研究智能体。该系统由三大核心组件驱动:针对复杂推理问题优化的Gemini深度思考增强版、突破奥赛级问题边界的新型推理时缩放定律,以及应对数学研究复杂性的密集型工具调用。我们通过从奥赛题到博士级习题的实证研究,特别是人工智能辅助数学研究的三大里程碑案例,展示了Aletheia的能力:(a)完全由AI生成的论文(Feng26),在算术几何中名为特征权值的结构常数计算上实现零人工干预;(b)展现人机协作的论文(LeeSeo26),证明了名为独立集的相互作用粒子系统的边界;(c)对布卢姆埃尔德什猜想数据库中700个开放问题的半自主评估(Feng等人,2026a),其中包括对四个开放问题的自主解答。为帮助公众更好地理解AI与数学交叉领域的发展,我们建议建立量化AI辅助成果自主性与新颖性的标准等级体系。最后,我们对数学领域的人机协作进行了展望。
English
Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requires navigating vast literature and constructing long-horizon proofs. In this work, we introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language. Specifically, Aletheia is powered by an advanced version of Gemini Deep Think for challenging reasoning problems, a novel inference-time scaling law that extends beyond Olympiad-level problems, and intensive tool use to navigate the complexities of mathematical research. We demonstrate the capability of Aletheia from Olympiad problems to PhD-level exercises and most notably, through several distinct milestones in AI-assisted mathematics research: (a) a research paper (Feng26) generated by AI without any human intervention in calculating certain structure constants in arithmetic geometry called eigenweights; (b) a research paper (LeeSeo26) demonstrating human-AI collaboration in proving bounds on systems of interacting particles called independent sets; and (c) an extensive semi-autonomous evaluation (Feng et al., 2026a) of 700 open problems on Bloom's Erdos Conjectures database, including autonomous solutions to four open questions. In order to help the public better understand the developments pertaining to AI and mathematics, we suggest codifying standard levels quantifying autonomy and novelty of AI-assisted results. We conclude with reflections on human-AI collaboration in mathematics.
PDF240February 13, 2026