Eka-Eval：面向印度语言的大规模语言模型综合评估框架

摘要

大型语言模型（LLMs）的快速发展，加剧了对超越英语中心基准、满足如印度等多语言地区需求的评估框架的需求。我们推出EKA-EVAL，一个统一且可直接用于生产的评估框架，它整合了超过35个基准测试，其中包括10个针对印度语言的特定数据集，覆盖推理、数学、工具使用、长上下文理解及阅读理解等多个类别。与现有的印度语言评估工具相比，EKA-EVAL提供了更广泛的基准覆盖，内置支持分布式推理、量化及多GPU使用。通过系统比较，EKA-EVAL被定位为首个面向全球及印度LLMs的端到端、可扩展评估套件，显著降低了多语言基准测试的门槛。该框架已开源，公开访问地址为https://github.com/lingo-iitgn/eka-eval，并作为EKA计划（https://eka.soket.ai）的一部分，旨在扩展至超过100个基准测试，为LLMs构建一个强大的多语言评估生态系统。

English

The rapid advancement of Large Language Models (LLMs) has intensified the need for evaluation frameworks that go beyond English centric benchmarks and address the requirements of linguistically diverse regions such as India. We present EKA-EVAL, a unified and production-ready evaluation framework that integrates over 35 benchmarks, including 10 Indic-specific datasets, spanning categories like reasoning, mathematics, tool use, long-context understanding, and reading comprehension. Compared to existing Indian language evaluation tools, EKA-EVAL offers broader benchmark coverage, with built-in support for distributed inference, quantization, and multi-GPU usage. Our systematic comparison positions EKA-EVAL as the first end-to-end, extensible evaluation suite tailored for both global and Indic LLMs, significantly lowering the barrier to multilingual benchmarking. The framework is open-source and publicly available at https://github.com/lingo-iitgn/ eka-eval and a part of ongoing EKA initiative (https://eka.soket.ai), which aims to scale up to over 100 benchmarks and establish a robust, multilingual evaluation ecosystem for LLMs.

Eka-Eval：面向印度语言的大规模语言模型综合评估框架

Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages

摘要

Support