AM-Thinking-v1: 32Bスケールにおける推論のフロンティアの推進

要旨

我々は、推論のフロンティアを前進させ、オープンソースイノベーションの協力的な精神を体現する32Bの密な言語モデル、AM-Thinking-v1を発表します。DeepSeek-R1を上回り、Qwen3-235B-A22BやSeed1.5-Thinkingのような主要なMixture-of-Experts（MoE）モデルと肩を並べるAM-Thinking-v1は、AIME 2024で85.3、AIME 2025で74.4、LiveCodeBenchで70.3という印象的なスコアを達成し、同規模のオープンソースモデルの中でも最先端の数学的およびコーディング能力を示しています。 AM-Thinking-v1は、完全にオープンソースのQwen2.5-32Bベースモデルと公開されているクエリを活用し、監督付き微調整と強化学習を組み合わせた緻密に設計されたポストトレーニングパイプラインを通じて、卓越した推論能力を提供します。この研究は、オープンソースコミュニティが32Bスケールで高性能を達成できることを示しており、これは実用的なデプロイメントと微調整のスイートスポットです。トップクラスの性能と実世界での使いやすさのバランスを取ることで、AM-Thinking-v1が中規模モデルを活用するためのさらなる協力的な取り組みを刺激し、アクセシビリティをイノベーションの核心に据えながら推論の限界を押し広げることを願っています。我々は、このモデルをhttps://huggingface.co/a-m-team/AM-Thinking-v1{Hugging Face}でオープンソースとして公開しました。

English

We present AM-Thinking-v1, a 32B dense language model that advances the frontier of reasoning, embodying the collaborative spirit of open-source innovation. Outperforming DeepSeek-R1 and rivaling leading Mixture-of-Experts (MoE) models like Qwen3-235B-A22B and Seed1.5-Thinking, AM-Thinking-v1 achieves impressive scores of 85.3 on AIME 2024, 74.4 on AIME 2025, and 70.3 on LiveCodeBench, showcasing state-of-the-art mathematical and coding capabilities among open-source models of similar scale. Built entirely from the open-source Qwen2.5-32B base model and publicly available queries, AM-Thinking-v1 leverages a meticulously crafted post-training pipeline - combining supervised fine-tuning and reinforcement learning - to deliver exceptional reasoning capabilities. This work demonstrates that the open-source community can achieve high performance at the 32B scale, a practical sweet spot for deployment and fine-tuning. By striking a balance between top-tier performance and real-world usability, we hope AM-Thinking-v1 inspires further collaborative efforts to harness mid-scale models, pushing reasoning boundaries while keeping accessibility at the core of innovation. We have open-sourced our model on https://huggingface.co/a-m-team/AM-Thinking-v1{Hugging Face}.