IndicMMLU-Pro: マルチタスク言語理解におけるインド系大規模言語モデルのベンチマーキング

要旨

インド亜大陸で15億人以上に知られるインド諸言語は、豊かな文化遺産、言語の多様性、複雑な構造から、自然言語処理（NLP）研究において独自の課題と機会を提供しています。IndicMMLU-Proは、インド諸言語全体で大規模言語モデル（LLM）を評価するために設計された包括的なベンチマークであり、MMLU Pro（Massive Multitask Language Understanding）フレームワークを基盤としています。ヒンディー語、ベンガル語、グジャラート語、マラーティー語、カンナダ語、パンジャブ語、タミル語、テルグ語、ウルドゥー語などの主要言語をカバーし、当該ベンチマークは、インド亜大陸の言語多様性がもたらす独自の課題と機会に対処しています。このベンチマークは、インド諸言語の微妙なニュアンスを捉えるよう慎重に作成された言語理解、推論、生成の幅広いタスクを包括しています。IndicMMLU-Proは、インド諸言語AIの研究領域を推進するための標準化された評価フレームワークを提供し、より正確で効率的、かつ文化的に敏感なモデルの開発を促進します。本論文では、ベンチマークの設計原則、タスク分類、データ収集方法を概説し、最先端の多言語モデルからのベースライン結果を提示しています。

English

Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro (Massive Multitask Language Understanding) framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique challenges and opportunities presented by the linguistic diversity of the Indian subcontinent. This benchmark encompasses a wide range of tasks in language comprehension, reasoning, and generation, meticulously crafted to capture the intricacies of Indian languages. IndicMMLU-Pro provides a standardized evaluation framework to push the research boundaries in Indic language AI, facilitating the development of more accurate, efficient, and culturally sensitive models. This paper outlines the benchmarks' design principles, task taxonomy, and data collection methodology, and presents baseline results from state-of-the-art multilingual models.