韓国教育基準に基づくマルチモーダル生成AIの評価

要旨

本論文では、韓国の国家教育試験を用いてマルチモーダル生成AIシステムを評価するための新しいベンチマークであるKorean National Educational Test Benchmark（KoNET）を提案する。KoNETは、韓国初等教育修了試験（KoEGED）、中等教育修了試験（KoMGED）、高等教育修了試験（KoHGED）、および大学修学能力試験（KoCSAT）の4つの試験で構成されている。これらの試験は、その厳格な基準と多様な問題設定で知られており、異なる教育レベルにおけるAIの性能を包括的に分析することを可能にする。韓国語に焦点を当てることで、KoNETは未開拓の言語におけるモデルの性能に関する洞察を提供する。オープンソース、オープンアクセス、クローズドAPIの幅広いモデルを、難易度、科目の多様性、および人間の誤答率を検証することで評価する。コードとデータセットビルダーは、https://github.com/naver-ai/KoNET で完全にオープンソースとして公開される予定である。

English

This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.

韓国教育基準に基づくマルチモーダル生成AIの評価

Evaluating Multimodal Generative AI with Korean Educational Standards

要旨

Support