알레테이아가 퍼스트프루프를 자율적으로 해결합니다.

초록

우리는 Gemini 3 Deep Think으로 구동되는 수학 연구 에이전트인 Aletheia(Feng et al., 2026b)의 첫 번째 FirstProof 챌린지 성과를 보고한다. 챌린지 허용 시간 내에서 Aletheia는 전문가 다수 평가에 따라 10개 문제 중 6개 문제(2, 5, 7, 8, 9, 10번)를 자율적으로 해결했다. 단, 전문가들의 의견이 불일치한 문제는 8번뿐이었다는 점을 덧붙인다. 완전한 투명성을 위해 우리의 FirstProof 해석 방식을 설명하고 실험 세부 사항 및 평가 방법을 공개한다. 원본 프롬프트와 출력은 https://github.com/google-deepmind/superhuman/tree/main/aletheia에서 확인할 수 있다.

English

We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

알레테이아가 퍼스트프루프를 자율적으로 해결합니다.

Aletheia tackles FirstProof autonomously

초록

Support