從KMMLU-Redux到KMMLU-Pro:一個用於大語言模型評估的專業韓語基準套件
From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation
July 11, 2025
作者: Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, Jinsik Lee
cs.AI
摘要
大型語言模型(LLMs)的發展需要涵蓋學術領域與產業領域的穩健基準,以有效評估其在現實場景中的適用性。本文中,我們介紹了兩個韓國專家級基準。KMMLU-Redux 是基於現有 KMMLU 重建而成,包含韓國國家技術資格考試的題目,並移除了關鍵錯誤以提高可靠性。KMMLU-Pro 則基於韓國國家專業執照考試,以反映韓國的專業知識。我們的實驗表明,這些基準全面代表了韓國的產業知識。我們已將數據集公開釋出。
English
The development of Large Language Models (LLMs) requires robust benchmarks
that encompass not only academic domains but also industrial fields to
effectively evaluate their applicability in real-world scenarios. In this
paper, we introduce two Korean expert-level benchmarks. KMMLU-Redux,
reconstructed from the existing KMMLU, consists of questions from the Korean
National Technical Qualification exams, with critical errors removed to enhance
reliability. KMMLU-Pro is based on Korean National Professional Licensure exams
to reflect professional knowledge in Korea. Our experiments demonstrate that
these benchmarks comprehensively represent industrial knowledge in Korea. We
release our dataset publicly available.