Eka-Eval: Un Framework di Valutazione Completo per Modelli Linguistici di Grandi Dimensioni nelle Lingue Indiane

Abstract

Il rapido progresso dei Large Language Models (LLM) ha accentuato la necessità di framework di valutazione che vadano oltre i benchmark centrati sull'inglese e affrontino le esigenze di regioni linguisticamente diversificate come l'India. Presentiamo EKA-EVAL, un framework di valutazione unificato e pronto per la produzione che integra oltre 35 benchmark, inclusi 10 dataset specifici per le lingue indiane, coprendo categorie come ragionamento, matematica, uso di strumenti, comprensione di contesti lunghi e comprensione della lettura. Rispetto agli strumenti di valutazione esistenti per le lingue indiane, EKA-EVAL offre una copertura più ampia dei benchmark, con supporto integrato per inferenza distribuita, quantizzazione e utilizzo multi-GPU. La nostra comparazione sistematica posiziona EKA-EVAL come il primo suite di valutazione end-to-end ed estensibile progettato sia per LLM globali che per quelli indiani, riducendo significativamente la barriera al benchmarking multilingue. Il framework è open-source e disponibile pubblicamente all'indirizzo https://github.com/lingo-iitgn/eka-eval e fa parte dell'iniziativa EKA in corso (https://eka.soket.ai), che mira a scalare fino a oltre 100 benchmark e a stabilire un ecosistema di valutazione multilingue robusto per i LLM.

English

The rapid advancement of Large Language Models (LLMs) has intensified the need for evaluation frameworks that go beyond English centric benchmarks and address the requirements of linguistically diverse regions such as India. We present EKA-EVAL, a unified and production-ready evaluation framework that integrates over 35 benchmarks, including 10 Indic-specific datasets, spanning categories like reasoning, mathematics, tool use, long-context understanding, and reading comprehension. Compared to existing Indian language evaluation tools, EKA-EVAL offers broader benchmark coverage, with built-in support for distributed inference, quantization, and multi-GPU usage. Our systematic comparison positions EKA-EVAL as the first end-to-end, extensible evaluation suite tailored for both global and Indic LLMs, significantly lowering the barrier to multilingual benchmarking. The framework is open-source and publicly available at https://github.com/lingo-iitgn/ eka-eval and a part of ongoing EKA initiative (https://eka.soket.ai), which aims to scale up to over 100 benchmarks and establish a robust, multilingual evaluation ecosystem for LLMs.

Eka-Eval: Un Framework di Valutazione Completo per Modelli Linguistici di Grandi Dimensioni nelle Lingue Indiane

Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages

Abstract

Support