ChatPaper.aiChatPaper

邁向自主數學研究

Towards Autonomous Mathematics Research

February 10, 2026
作者: Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung, Joonkyung Lee, Carlo Pagano, Sang-hyun Kim, Federico Pasqualotto, Sergei Gukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao, Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-tze Cheng, Demis Hassabis, Koray Kavukcuoglu, Quoc V. Le, Thang Luong
cs.AI

摘要

近期基礎模型的突破性進展,已催生出能在國際數學奧林匹克競賽中達到金牌標準的推理系統。然而從競賽級解題邁向專業數學研究,需要具備文獻梳理能力與建構長視野證明的能力。本研究推出Aletheia數學研究智能體,能透過自然語言端到端地迭代生成、驗證與修正解題方案。該系統由三大核心技術驅動:針對複雜推理問題的Gemini Deep Think增強版、突破奧林匹克競賽題目範疇的新穎推論時標度律,以及應對數學研究複雜性的密集工具調用機制。我們從奧數題目到博士級習題多維度驗證Aletheia的能力,並透過三大AI輔助數學研究里程碑彰顯其突破:(a) 完全由AI自主生成的研究論文《Feng26》,成功計算算術幾何中特徵權值這類結構常數;(b) 展現人機協作的《LeeSeo26》論文,推導出獨立粒子系統(獨立集)的邊界條件;(c) 對Bloom的埃爾德什猜想數據庫中700個開放問題的大規模半自主評估(Feng等人,2026a),其中包含對四個開放問題的自主解答。為促進公眾理解AI與數學的融合發展,我們建議建立量化AI輔助成果自主性與新穎度的標準分級體系。文末將對數學領域的人機協作模式進行展望。
English
Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requires navigating vast literature and constructing long-horizon proofs. In this work, we introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language. Specifically, Aletheia is powered by an advanced version of Gemini Deep Think for challenging reasoning problems, a novel inference-time scaling law that extends beyond Olympiad-level problems, and intensive tool use to navigate the complexities of mathematical research. We demonstrate the capability of Aletheia from Olympiad problems to PhD-level exercises and most notably, through several distinct milestones in AI-assisted mathematics research: (a) a research paper (Feng26) generated by AI without any human intervention in calculating certain structure constants in arithmetic geometry called eigenweights; (b) a research paper (LeeSeo26) demonstrating human-AI collaboration in proving bounds on systems of interacting particles called independent sets; and (c) an extensive semi-autonomous evaluation (Feng et al., 2026a) of 700 open problems on Bloom's Erdos Conjectures database, including autonomous solutions to four open questions. In order to help the public better understand the developments pertaining to AI and mathematics, we suggest codifying standard levels quantifying autonomy and novelty of AI-assisted results. We conclude with reflections on human-AI collaboration in mathematics.
PDF240February 13, 2026