ChatPaper.aiChatPaper

阿莱西娅自主攻克第一证明难题。

Aletheia tackles FirstProof autonomously

February 24, 2026
作者: Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Sergei Gukov, Chiang-Chiang Tsai, David Woodruff, Adel Javanmard, Aryan Mokhtari, Dawsen Hwang, Yuri Chervonyi, Jonathan N. Lee, Garrett Bingham, Trieu H. Trinh, Vahab Mirrokni, Quoc V. Le, Thang Luong
cs.AI

摘要

我们在首届FirstProof挑战赛中报告了基于Gemini 3 Deep Think的数学研究智能体Aletheia(Feng等人,2026b)的表现。在挑战赛规定时限内,根据多数专家评估,Aletheia自主解决了10道题目中的6道(第2、5、7、8、9、10题);需要说明的是,仅针对第8题专家意见未达成一致。为保持完全透明,我们阐述了对FirstProof规则的理解,并公开了实验细节与评估方法。原始提示词及输出结果详见https://github.com/google-deepmind/superhuman/tree/main/aletheia。
English
We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.
PDF61March 28, 2026