Aryabhata: Ein prüfungsorientiertes Sprachmodell für JEE-Mathematik

papers.abstract

Wir präsentieren Aryabhata 1.0, ein kompaktes Modell mit 7B Parametern für mathematisches Denken, das für die indische Aufnahmeprüfung, die Joint Entrance Examination (JEE), optimiert ist. Trotz rasanter Fortschritte bei großen Sprachmodellen (LLMs) sind aktuelle Modelle oft noch nicht für den Bildungsbereich geeignet. Aryabhata 1.0 wurde durch die Zusammenführung starker Open-Weight-Reasoning-Modelle entwickelt, gefolgt von einem überwachten Feinabstimmungsprozess (SFT) mit Curriculum-Learning auf verifizierten Chain-of-Thought (CoT)-Spuren, die durch Best-of-n-Rejection-Sampling kuratiert wurden. Um die Leistung weiter zu steigern, wenden wir Reinforcement Learning mit überprüfbaren Belohnungen (RLVR) an, basierend auf dem A2C-Objective mit gruppenrelativer Vorteilsschätzung sowie neuartigen Explorationsstrategien wie Adaptive Group Resizing und Temperature Scaling. Evaluiert sowohl auf In-Distribution- (JEE Main 2025) als auch Out-of-Distribution-Benchmarks (MATH, GSM8K), übertrifft Aryabhata bestehende Modelle in Bezug auf Genauigkeit und Effizienz und bietet gleichzeitig pädagogisch nützliche Schritt-für-Schritt-Begründungen. Wir veröffentlichen Aryabhata als Basismodell, um prüfungszentrierte, Open-Source-kleine Sprachmodelle voranzutreiben. Dies markiert unsere erste offene Veröffentlichung für Community-Feedback (https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0 auf Hugging Face}); PW trainiert aktiv zukünftige Modelle, um die Lernergebnisse für Schüler weiter zu verbessern.

English

We present Aryabhata 1.0, a compact 7B parameter math reasoning model optimized for the Indian academic exam, the Joint Entrance Examination (JEE). Despite rapid progress in large language models (LLMs), current models often remain unsuitable for educational use. Aryabhata 1.0 is built by merging strong open-weight reasoning models, followed by supervised fine-tuning (SFT) with curriculum learning on verified chain-of-thought (CoT) traces curated through best-of-n rejection sampling. To further boost performance, we apply reinforcement learning with verifiable rewards (RLVR) using A2C objective with group-relative advantage estimation alongwith novel exploration strategies such as Adaptive Group Resizing and Temperature Scaling. Evaluated on both in-distribution (JEE Main 2025) and out-of-distribution (MATH, GSM8K) benchmarks, Aryabhata outperforms existing models in accuracy and efficiency, while offering pedagogically useful step-by-step reasoning. We release Aryabhata as a foundation model to advance exam-centric, open-source small language models. This marks our first open release for community feedback (https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0 on Hugging Face}); PW is actively training future models to further improve learning outcomes for students.

Aryabhata: Ein prüfungsorientiertes Sprachmodell für JEE-Mathematik

Aryabhata: An exam-focused language model for JEE Math

papers.abstract

Support