奥林匹克竞技场奖牌榜:目前最聪明的人工智能是谁?
OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
June 24, 2024
作者: Zhen Huang, Zengzhi Wang, Shijie Xia, Pengfei Liu
cs.AI
摘要
在本报告中,我们提出以下问题:截至目前,根据奥林匹克竞技场(一项奥林匹克级别的、多学科、多模态的超智能AI基准测试),谁是最智能的AI模型?我们特别关注最近发布的模型:Claude-3.5-Sonnet、Gemini-1.5-Pro和GPT-4o。我们首次提出使用奥运奖牌榜方法来根据AI模型在各个学科的综合表现对其进行排名。实证结果显示:(1)Claude-3.5-Sonnet在整体表现上与GPT-4o具有很高的竞争力,甚至在一些学科(即物理、化学和生物)上超越了GPT-4o。(2)Gemini-1.5-Pro和GPT-4V分别排在GPT-4o和Claude-3.5-Sonnet之后,但它们之间存在明显的表现差距。(3)开源社区的AI模型表现明显落后于这些专有模型。(4)这些模型在这一基准测试上的表现令人不满,表明在实现超智能之前我们还有很长的路要走。我们将继续跟踪和评估最新强大模型在该基准测试上的表现(可在https://github.com/GAIR-NLP/OlympicArena找到)。
English
In this report, we pose the following question: Who is the most intelligent
AI model to date, as measured by the OlympicArena (an Olympic-level,
multi-discipline, multi-modal benchmark for superintelligent AI)? We
specifically focus on the most recently released models: Claude-3.5-Sonnet,
Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic
medal Table approach to rank AI models based on their comprehensive performance
across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet
shows highly competitive overall performance over GPT-4o, even surpassing
GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2)
Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and
Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The
performance of AI models from the open-source community significantly lags
behind these proprietary models. (4) The performance of these models on this
benchmark has been less than satisfactory, indicating that we still have a long
way to go before achieving superintelligence. We remain committed to
continuously tracking and evaluating the performance of the latest powerful
models on this benchmark (available at
https://github.com/GAIR-NLP/OlympicArena).Summary
AI-Generated Summary