阿莱西娅自主攻克第一证明难题。

摘要

我们在首届FirstProof挑战赛中报告了基于Gemini 3 Deep Think的数学研究智能体Aletheia（Feng等人，2026b）的表现。在挑战赛规定时限内，根据多数专家评估，Aletheia自主解决了10道题目中的6道（第2、5、7、8、9、10题）；需要说明的是，仅针对第8题专家意见未达成一致。为保持完全透明，我们阐述了对FirstProof规则的理解，并公开了实验细节与评估方法。原始提示词及输出结果详见https://github.com/google-deepmind/superhuman/tree/main/aletheia。

English

We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

阿莱西娅自主攻克第一证明难题。

Aletheia tackles FirstProof autonomously

摘要

Support