Aryabhata: Een examen-gericht taalmodel voor JEE Wiskunde

Samenvatting

We presenteren Aryabhata 1.0, een compact model voor wiskundig redeneren met 7B parameters, geoptimaliseerd voor het Indiase academische examen, het Joint Entrance Examination (JEE). Ondanks snelle vooruitgang in grote taalmodellen (LLMs), zijn huidige modellen vaak nog niet geschikt voor educatief gebruik. Aryabhata 1.0 is gebouwd door sterke open-weight redeneermodellen samen te voegen, gevolgd door supervised fine-tuning (SFT) met curriculum learning op geverifieerde chain-of-thought (CoT) sporen die zijn samengesteld via best-of-n rejection sampling. Om de prestaties verder te verbeteren, passen we reinforcement learning met verifieerbare beloningen (RLVR) toe met behulp van het A2C-doel met groep-relatieve voordeelschatting, samen met nieuwe exploratiestrategieën zoals Adaptive Group Resizing en Temperature Scaling. Geëvalueerd op zowel in-distributie (JEE Main 2025) als out-of-distributie (MATH, GSM8K) benchmarks, overtreft Aryabhata bestaande modellen in nauwkeurigheid en efficiëntie, terwijl het pedagogisch nuttige stapsgewijze redenering biedt. We geven Aryabhata vrij als een foundation model om examengerichte, open-source kleine taalmodellen te bevorderen. Dit markeert onze eerste open release voor gemeenschapsfeedback (https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0 op Hugging Face}); PW is actief bezig met het trainen van toekomstige modellen om de leerresultaten voor studenten verder te verbeteren.

English

We present Aryabhata 1.0, a compact 7B parameter math reasoning model optimized for the Indian academic exam, the Joint Entrance Examination (JEE). Despite rapid progress in large language models (LLMs), current models often remain unsuitable for educational use. Aryabhata 1.0 is built by merging strong open-weight reasoning models, followed by supervised fine-tuning (SFT) with curriculum learning on verified chain-of-thought (CoT) traces curated through best-of-n rejection sampling. To further boost performance, we apply reinforcement learning with verifiable rewards (RLVR) using A2C objective with group-relative advantage estimation alongwith novel exploration strategies such as Adaptive Group Resizing and Temperature Scaling. Evaluated on both in-distribution (JEE Main 2025) and out-of-distribution (MATH, GSM8K) benchmarks, Aryabhata outperforms existing models in accuracy and efficiency, while offering pedagogically useful step-by-step reasoning. We release Aryabhata as a foundation model to advance exam-centric, open-source small language models. This marks our first open release for community feedback (https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0 on Hugging Face}); PW is actively training future models to further improve learning outcomes for students.

Aryabhata: Een examen-gericht taalmodel voor JEE Wiskunde

Aryabhata: An exam-focused language model for JEE Math

Samenvatting

Support