ChatPaper.aiChatPaper

Eka-Eval:面向印度語言的大型語言模型綜合評估框架

Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages

July 2, 2025
作者: Samridhi Raj Sinha, Rajvee Sheth, Abhishek Upperwal, Mayank Singh
cs.AI

摘要

大型語言模型(LLMs)的快速發展,加劇了對超越以英語為中心的基準評估框架的需求,並需滿足如印度等語言多樣化地區的要求。我們推出了EKA-EVAL,這是一個統一且可直接投入使用的評估框架,整合了超過35個基準測試,其中包括10個針對印度語言的特定數據集,涵蓋推理、數學、工具使用、長上下文理解及閱讀理解等類別。與現有的印度語言評估工具相比,EKA-EVAL提供了更廣泛的基準覆蓋範圍,並內建支援分散式推理、量化及多GPU使用。我們的系統性比較表明,EKA-EVAL是首個為全球及印度LLMs量身定制的端到端、可擴展評估套件,大幅降低了多語言基準測試的門檻。該框架為開源項目,公開於https://github.com/lingo-iitgn/eka-eval,並作為持續進行的EKA計劃(https://eka.soket.ai)的一部分,旨在擴展至超過100個基準測試,建立一個堅固的多語言LLMs評估生態系統。
English
The rapid advancement of Large Language Models (LLMs) has intensified the need for evaluation frameworks that go beyond English centric benchmarks and address the requirements of linguistically diverse regions such as India. We present EKA-EVAL, a unified and production-ready evaluation framework that integrates over 35 benchmarks, including 10 Indic-specific datasets, spanning categories like reasoning, mathematics, tool use, long-context understanding, and reading comprehension. Compared to existing Indian language evaluation tools, EKA-EVAL offers broader benchmark coverage, with built-in support for distributed inference, quantization, and multi-GPU usage. Our systematic comparison positions EKA-EVAL as the first end-to-end, extensible evaluation suite tailored for both global and Indic LLMs, significantly lowering the barrier to multilingual benchmarking. The framework is open-source and publicly available at https://github.com/lingo-iitgn/ eka-eval and a part of ongoing EKA initiative (https://eka.soket.ai), which aims to scale up to over 100 benchmarks and establish a robust, multilingual evaluation ecosystem for LLMs.
PDF52July 7, 2025