从KMMLU-Redux到KMMLU-Pro:面向大语言模型评估的专业韩语基准套件
From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation
July 11, 2025
作者: Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, Jinsik Lee
cs.AI
摘要
大型语言模型(LLMs)的发展需要涵盖学术领域及工业界的稳健基准,以有效评估其在现实场景中的适用性。本文中,我们引入了两个韩国专家级基准。KMMLU-Redux基于现有KMMLU重构而成,包含韩国国家技术资格考试题目,并剔除了关键错误以提高可靠性。KMMLU-Pro则依据韩国国家专业执照考试,旨在反映韩国的专业知识。实验表明,这些基准全面代表了韩国的工业知识。我们公开发布了该数据集。
English
The development of Large Language Models (LLMs) requires robust benchmarks
that encompass not only academic domains but also industrial fields to
effectively evaluate their applicability in real-world scenarios. In this
paper, we introduce two Korean expert-level benchmarks. KMMLU-Redux,
reconstructed from the existing KMMLU, consists of questions from the Korean
National Technical Qualification exams, with critical errors removed to enhance
reliability. KMMLU-Pro is based on Korean National Professional Licensure exams
to reflect professional knowledge in Korea. Our experiments demonstrate that
these benchmarks comprehensively represent industrial knowledge in Korea. We
release our dataset publicly available.