Eka-Eval: 인도 언어를 위한 대규모 언어 모델 종합 평가 프레임워크

초록

대규모 언어 모델(LLMs)의 급속한 발전은 영어 중심 벤치마크를 넘어 인도와 같은 언어적으로 다양한 지역의 요구를 해결할 수 있는 평가 프레임워크의 필요성을 더욱 강화하고 있습니다. 우리는 EKA-EVAL을 소개합니다. 이는 추론, 수학, 도구 사용, 장문 맥락 이해, 독해 등 다양한 범주를 아우르는 35개 이상의 벤치마크(인도 특화 데이터셋 10개 포함)를 통합한 통합적이고 프로덕션 준비가 된 평가 프레임워크입니다. 기존의 인도 언어 평가 도구와 비교하여, EKA-EVAL은 더 광범위한 벤치마크 커버리지를 제공하며, 분산 추론, 양자화, 다중 GPU 사용에 대한 내장 지원을 포함하고 있습니다. 우리의 체계적인 비교를 통해 EKA-EVAL은 글로벌 및 인도 LLMs 모두를 위한 최초의 종단 간 확장 가능한 평가 도구로 자리매김하며, 다국어 벤치마킹의 진입 장벽을 크게 낮춥니다. 이 프레임워크는 오픈소스이며, https://github.com/lingo-iitgn/eka-eval에서 공개적으로 이용 가능합니다. 또한, 100개 이상의 벤치마크로 확장하고 LLMs를 위한 강력한 다국어 평가 생태계를 구축하려는 EKA 이니셔티브(https://eka.soket.ai)의 일부입니다.

English

The rapid advancement of Large Language Models (LLMs) has intensified the need for evaluation frameworks that go beyond English centric benchmarks and address the requirements of linguistically diverse regions such as India. We present EKA-EVAL, a unified and production-ready evaluation framework that integrates over 35 benchmarks, including 10 Indic-specific datasets, spanning categories like reasoning, mathematics, tool use, long-context understanding, and reading comprehension. Compared to existing Indian language evaluation tools, EKA-EVAL offers broader benchmark coverage, with built-in support for distributed inference, quantization, and multi-GPU usage. Our systematic comparison positions EKA-EVAL as the first end-to-end, extensible evaluation suite tailored for both global and Indic LLMs, significantly lowering the barrier to multilingual benchmarking. The framework is open-source and publicly available at https://github.com/lingo-iitgn/ eka-eval and a part of ongoing EKA initiative (https://eka.soket.ai), which aims to scale up to over 100 benchmarks and establish a robust, multilingual evaluation ecosystem for LLMs.

Eka-Eval: 인도 언어를 위한 대규모 언어 모델 종합 평가 프레임워크

Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages

초록

Support