Aryabhata: JEE数学試験に特化した言語モデル

要旨

私たちは、インドの大学入試であるJoint Entrance Examination（JEE）向けに最適化された、コンパクトな7Bパラメータの数学推論モデル「Aryabhata 1.0」を発表します。大規模言語モデル（LLM）の急速な進展にもかかわらず、現在のモデルは教育用途には不適切な場合が多いです。Aryabhata 1.0は、強力なオープンウェイトの推論モデルを統合し、その後、ベストオブnリジェクションサンプリングによってキュレートされた検証済みの連鎖的思考（CoT）トレースを用いたカリキュラム学習による教師あり微調整（SFT）を施して構築されました。さらに性能を向上させるため、A2C目的関数を用いた検証可能な報酬による強化学習（RLVR）を適用し、グループ相対アドバンテージ推定とともに、適応的グループリサイジングや温度スケーリングといった新しい探索戦略を採用しました。 Aryabhataは、分布内（JEE Main 2025）および分布外（MATH、GSM8K）のベンチマークで評価され、既存のモデルを精度と効率の両面で上回り、教育的に有用なステップバイステップの推論を提供します。私たちは、試験中心のオープンソース小型言語モデルの進展を促す基盤モデルとしてAryabhataを公開します。これは、コミュニティからのフィードバックを求める初めてのオープンリリースです（https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0 on Hugging Face}）。PWは、学生の学習成果をさらに向上させるため、今後のモデルのトレーニングを積極的に進めています。

English

We present Aryabhata 1.0, a compact 7B parameter math reasoning model optimized for the Indian academic exam, the Joint Entrance Examination (JEE). Despite rapid progress in large language models (LLMs), current models often remain unsuitable for educational use. Aryabhata 1.0 is built by merging strong open-weight reasoning models, followed by supervised fine-tuning (SFT) with curriculum learning on verified chain-of-thought (CoT) traces curated through best-of-n rejection sampling. To further boost performance, we apply reinforcement learning with verifiable rewards (RLVR) using A2C objective with group-relative advantage estimation alongwith novel exploration strategies such as Adaptive Group Resizing and Temperature Scaling. Evaluated on both in-distribution (JEE Main 2025) and out-of-distribution (MATH, GSM8K) benchmarks, Aryabhata outperforms existing models in accuracy and efficiency, while offering pedagogically useful step-by-step reasoning. We release Aryabhata as a foundation model to advance exam-centric, open-source small language models. This marks our first open release for community feedback (https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0{Aryabhata 1.0 on Hugging Face}); PW is actively training future models to further improve learning outcomes for students.

Aryabhata: JEE数学試験に特化した言語モデル

Aryabhata: An exam-focused language model for JEE Math

要旨

Support