小型モデル、大いなる論理：多様性駆動最適化がVibeThinker-1.5Bに大規模モデルの推論能力を発現させる

要旨

小規模モデルが本質的に堅牢な推論能力を欠くという従来の通説に異議を申し立てる本報告書では、Spectrum-to-Signal Principle（SSP）に基づいて開発された15億パラメータの密モデル「VibeThinker-1.5B」を紹介する。これは、DeepSeek R1（671B）やKimi k2（1T超）のようなモデルに見られる、能力向上のためにモデルパラメータをスケーリングする主流のアプローチに挑戦するものである。SSPフレームワークはまず、多様な解のスペクトルを生成するための「二段階多様性探索蒸留（SFT）」を採用し、続いて正しい信号を増幅する「最大エントロピー誘導方策最適化（RL）」を実施する。総トレーニングコストがわずか7,800ドルであるにもかかわらず、VibeThinker-1.5Bは、Magistral MediumやClaude Opus 4のようなクローズドソースモデルを上回る優れた推論能力を示し、GPT OSS-20B Mediumのようなオープンソースモデルと同等の性能を発揮する。特筆すべきは、パラメータ数が400倍大きいDeepSeek R1を3つの数学ベンチマークで凌駕している点である：AIME24（80.3対79.8）、AIME25（74.4対70.0）、HMMT25（50.4対41.7）。これはベースモデルの成績（それぞれ6.7、4.3、0.6）から大幅な改善である。LiveCodeBench V6では51.1点を獲得し、Magistral Mediumの50.3点およびベースモデルの0.0点を上回った。これらの知見は、小規模モデルが大規模モデルに匹敵する推論能力を達成可能であり、トレーニングと推論のコストを劇的に削減することで、先進的なAI研究の民主化を促進することを実証している。

English

Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.

小型モデル、大いなる論理：多様性駆動最適化がVibeThinker-1.5Bに大規模モデルの推論能力を発現させる

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

要旨

Support