Aryabhata：一款专为JEE数学考试设计的语言模型

摘要

我们推出Aryabhata 1.0，这是一款专为印度学术考试——联合入学考试（JEE）优化的紧凑型7B参数数学推理模型。尽管大型语言模型（LLMs）发展迅速，现有模型往往仍不适用于教育场景。Aryabhata 1.0通过融合多个强大的开源推理模型构建而成，随后采用课程学习策略，在精选的链式思维（CoT）轨迹上进行监督微调（SFT），这些轨迹通过最佳n次拒绝采样法验证。为进一步提升性能，我们应用了基于可验证奖励的强化学习（RLVR），采用A2C目标配合群体相对优势估计，并引入自适应群体大小调整和温度缩放等新颖探索策略。在分布内（JEE Main 2025）和分布外（MATH, GSM8K）基准测试中，Aryabhata在准确性和效率上均超越现有模型，同时提供具有教学价值的逐步推理过程。我们将Aryabhata作为基础模型发布，旨在推动以考试为中心的开源小型语言模型的发展。这是我们首次公开发布，旨在收集社区反馈（访问https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0在Hugging Face上的页面}）；PW正积极训练未来模型，以进一步提升学生的学习成效。

English

We present Aryabhata 1.0, a compact 7B parameter math reasoning model optimized for the Indian academic exam, the Joint Entrance Examination (JEE). Despite rapid progress in large language models (LLMs), current models often remain unsuitable for educational use. Aryabhata 1.0 is built by merging strong open-weight reasoning models, followed by supervised fine-tuning (SFT) with curriculum learning on verified chain-of-thought (CoT) traces curated through best-of-n rejection sampling. To further boost performance, we apply reinforcement learning with verifiable rewards (RLVR) using A2C objective with group-relative advantage estimation alongwith novel exploration strategies such as Adaptive Group Resizing and Temperature Scaling. Evaluated on both in-distribution (JEE Main 2025) and out-of-distribution (MATH, GSM8K) benchmarks, Aryabhata outperforms existing models in accuracy and efficiency, while offering pedagogically useful step-by-step reasoning. We release Aryabhata as a foundation model to advance exam-centric, open-source small language models. This marks our first open release for community feedback (https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0 on Hugging Face}); PW is actively training future models to further improve learning outcomes for students.

Aryabhata：一款专为JEE数学考试设计的语言模型

Aryabhata: An exam-focused language model for JEE Math

摘要

Support