奧林匹克競技場獎牌排名：迄今為止最聰明的人工智慧是誰？

摘要

在這份報告中，我們提出以下問題：根據奧林匹克競技場（一個奧運級、多學科、多模態的超智能AI基準測試），迄今為止，誰是最聰明的AI模型？我們專注於最近發布的模型：Claude-3.5-Sonnet、Gemini-1.5-Pro和GPT-4o。我們首次提議使用奧運獎牌榜方法來排名AI模型，根據它們在各種學科上的綜合表現。實證結果顯示：（1）Claude-3.5-Sonnet在整體表現上高度競爭，甚至在某些科目（即物理、化學和生物學）上超越了GPT-4o。（2）Gemini-1.5-Pro和GPT-4V在排名上緊隨GPT-4o和Claude-3.5-Sonnet之後，但它們之間存在明顯的表現差距。（3）來自開源社區的AI模型表現顯著落後於這些專有模型。（4）這些模型在這個基準測試中的表現仍然不盡人意，表明在實現超智能之前，我們還有很長的路要走。我們致力於持續追蹤和評估最新強大模型在這個基準測試上的表現（可在https://github.com/GAIR-NLP/OlympicArena找到）。

English

In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena).

奧林匹克競技場獎牌排名：迄今為止最聰明的人工智慧是誰？

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

摘要

Support