ChatPaper.aiChatPaper

在解決奧林匹亞幾何問題上獲得金牌表現,使用 AlphaGeometry2。

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

February 5, 2025
作者: Yuri Chervonyi, Trieu H. Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V. Le, Thang Luong
cs.AI

摘要

我們介紹了AlphaGeometry2,這是Trinh等人(2024年)提出的AlphaGeometry的顯著改進版本,現在已超越了平均金牌得主在解決奧林匹亞幾何問題方面的能力。為了實現這一點,我們首先擴展了原始的AlphaGeometry語言,以應對涉及物體運動的更難問題,以及包含角度、比例和距離的線性方程的問題。這些改進與其他添加一起,顯著提高了AlphaGeometry語言在2000-2024年國際數學奧林匹亞(IMO)幾何問題中的覆蓋率,從66%提高到88%。AlphaGeometry2的搜索過程也得到了很大改善,通過使用Gemini架構進行更好的語言建模,以及一種結合多個搜索樹的新型知識共享機制。再加上對符號引擎和合成數據生成的進一步增強,我們將AlphaGeometry2對過去25年所有幾何問題的整體解決率顯著提升到84%,而之前為54%。AlphaGeometry2也是在IMO 2024年獲得銀牌標準的系統的一部分。最後,我們報告了在將AlphaGeometry2作為完全自動化系統的一部分,可可靠地從自然語言輸入直接解決幾何問題的進展。
English
We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024), which has now surpassed an average gold medalist in solving Olympiad geometry problems. To achieve this, we first extend the original AlphaGeometry language to tackle harder problems involving movements of objects, and problems containing linear equations of angles, ratios, and distances. This, together with other additions, has markedly improved the coverage rate of the AlphaGeometry language on International Math Olympiads (IMO) 2000-2024 geometry problems from 66% to 88%. The search process of AlphaGeometry2 has also been greatly improved through the use of Gemini architecture for better language modeling, and a novel knowledge-sharing mechanism that combines multiple search trees. Together with further enhancements to the symbolic engine and synthetic data generation, we have significantly boosted the overall solving rate of AlphaGeometry2 to 84% for all geometry problems over the last 25 years, compared to 54% previously. AlphaGeometry2 was also part of the system that achieved silver-medal standard at IMO 2024 https://dpmd.ai/imo-silver. Last but not least, we report progress towards using AlphaGeometry2 as a part of a fully automated system that reliably solves geometry problems directly from natural language input.
PDF445February 7, 2025