OlympicArena Medaille-ranglijst: Wie is tot nu toe de meest intelligente AI?

Samenvatting

In dit rapport stellen we de volgende vraag: Wie is tot op heden het meest intelligente AI-model, gemeten aan de hand van de OlympicArena (een Olympisch, multidisciplinair, multimodaal benchmark voor superintelligente AI)? We richten ons specifiek op de meest recent uitgebrachte modellen: Claude-3.5-Sonnet, Gemini-1.5-Pro en GPT-4o. Voor het eerst stellen we voor om een Olympische medailletabel te gebruiken om AI-modellen te rangschikken op basis van hun algehele prestaties in verschillende disciplines. Empirische resultaten laten het volgende zien: (1) Claude-3.5-Sonnet toont een zeer competitieve algehele prestatie ten opzichte van GPT-4o, en overtreft GPT-4o zelfs in enkele vakken (namelijk Natuurkunde, Scheikunde en Biologie). (2) Gemini-1.5-Pro en GPT-4V staan direct achter GPT-4o en Claude-3.5-Sonnet, maar met een duidelijk prestatieverschil tussen hen. (3) De prestaties van AI-modellen uit de open-sourcegemeenschap blijven aanzienlijk achter bij deze propriëtaire modellen. (4) De prestaties van deze modellen op deze benchmark zijn minder bevredigend, wat aangeeft dat we nog een lange weg te gaan hebben voordat we superintelligentie bereiken. We blijven toegewijd aan het continu volgen en evalueren van de prestaties van de nieuwste krachtige modellen op deze benchmark (beschikbaar op https://github.com/GAIR-NLP/OlympicArena).

English

In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena).

OlympicArena Medaille-ranglijst: Wie is tot nu toe de meest intelligente AI?

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

Samenvatting

Support