Aryabhata:一款專為JEE數學考試設計的語言模型
Aryabhata: An exam-focused language model for JEE Math
August 12, 2025
作者: Ritvik Rastogi, Sachin Dharashivkar, Sandeep Varma
cs.AI
摘要
我們推出Aryabhata 1.0,這是一個專為印度學術考試——聯合入學考試(JEE)優化的緊湊型7B參數數學推理模型。儘管大型語言模型(LLMs)發展迅速,現有模型往往仍不適合教育用途。Aryabhata 1.0通過融合強大的開源推理模型構建,隨後採用課程學習進行監督微調(SFT),並使用通過最佳n次拒絕採樣精選的驗證過的思維鏈(CoT)軌跡。為了進一步提升性能,我們應用帶有可驗證獎勵的強化學習(RLVR),採用A2C目標與群體相對優勢估計,以及創新的探索策略,如自適應群體大小調整和溫度縮放。
在分佈內(JEE Main 2025)和分佈外(MATH, GSM8K)基準測試中,Aryabhata在準確性和效率上均優於現有模型,同時提供教學上有用的逐步推理。我們將Aryabhata作為基礎模型發布,以推進以考試為中心的開源小型語言模型的發展。這是我們首次公開釋出,旨在收集社區反饋(https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0在Hugging Face上});PW正在積極訓練未來模型,以進一步提高學生的學習成果。
English
We present Aryabhata 1.0, a compact 7B parameter math reasoning
model optimized for the Indian academic exam, the Joint Entrance Examination
(JEE). Despite rapid progress in large language models (LLMs), current models
often remain unsuitable for educational use. Aryabhata 1.0 is built by merging
strong open-weight reasoning models, followed by supervised fine-tuning (SFT)
with curriculum learning on verified chain-of-thought (CoT) traces curated
through best-of-n rejection sampling. To further boost performance, we apply
reinforcement learning with verifiable rewards (RLVR) using A2C objective with
group-relative advantage estimation alongwith novel exploration strategies such
as Adaptive Group Resizing and Temperature Scaling.
Evaluated on both in-distribution (JEE Main 2025) and out-of-distribution
(MATH, GSM8K) benchmarks, Aryabhata outperforms existing models in accuracy and
efficiency, while offering pedagogically useful step-by-step reasoning. We
release Aryabhata as a foundation model to advance exam-centric, open-source
small language models. This marks our first open release for community feedback
(https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0
on Hugging Face}); PW is actively training future models to further improve
learning outcomes for students.