IndicMMLU-Pro：在多任務語言理解上對印度語系大型語言模型進行基準測試

摘要

印度次大陸有超過 15 億人口使用的印度語言，因其豐富的文化遺產、語言多樣性和複雜結構，為自然語言處理（NLP）研究帶來獨特的挑戰和機遇。IndicMMLU-Pro 是一個全面的基準，旨在評估大型語言模型（LLMs）在印度語言上的表現，建立在 MMLU Pro（大規模多任務語言理解）框架之上。覆蓋主要語言如印地語、孟加拉語、古吉拉特語、馬拉地語、坎納達語、旁遮普語、泰米爾語、泰盧固語和烏爾都語，我們的基準解決了印度次大陸語言多樣性帶來的獨特挑戰和機遇。這個基準包含了語言理解、推理和生成等各種任務，精心設計以捕捉印度語言的細微差異。IndicMMLU-Pro 提供了標準化的評估框架，推動印度語言人工智慧研究的邊界，促進更準確、高效和具文化敏感性的模型的發展。本文概述了基準的設計原則、任務分類法和數據收集方法，並呈現了來自最先進多語言模型的基準結果。

English

Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro (Massive Multitask Language Understanding) framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique challenges and opportunities presented by the linguistic diversity of the Indian subcontinent. This benchmark encompasses a wide range of tasks in language comprehension, reasoning, and generation, meticulously crafted to capture the intricacies of Indian languages. IndicMMLU-Pro provides a standardized evaluation framework to push the research boundaries in Indic language AI, facilitating the development of more accurate, efficient, and culturally sensitive models. This paper outlines the benchmarks' design principles, task taxonomy, and data collection methodology, and presents baseline results from state-of-the-art multilingual models.