小模型,大邏輯:多樣性驅動優化激發VibeThinker-1.5B的大模型推理能力
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
November 9, 2025
作者: Sen Xu, Yi Zhou, Wei Wang, Jixin Min, Zhibin Yin, Yingwei Dai, Shixi Liu, Lianyu Pang, Yirong Chen, Junlin Zhang
cs.AI
摘要
本報告提出的VibeThinker-1.5B(15億參數稠密模型)挑戰了「小模型必然缺乏強健推理能力」的主流共識,該模型透過我們的光譜至信號原則(SSP)開發而成。此舉質疑了當前依賴擴大模型參數來提升能力的主流做法(如DeepSeek R1的6710億參數和Kimi k2的逾萬億參數)。SSP框架首先採用兩階段多樣性探索蒸餾法(SFT)生成廣泛的解決方案光譜,再通過最大熵引導策略優化(RL)強化正確信號。僅耗費7,800美元總訓練成本,VibeThinker-1.5B不僅在推理能力上超越閉源模型Magistral Medium和Claude Opus 4,更與開源模型GPT OSS-20B Medium表現相當。值得注意的是,它在三項數學基準測試中超越參數量達400倍的DeepSeek R1:AIME24(80.3對79.8)、AIME25(74.4對70.0)及HMMT25(50.4對41.7),相較其基礎模型(分別為6.7、4.3和0.6)實現質變提升。在LiveCodeBench V6測評中,其51.1分的成績優於Magistral Medium的50.3分,而基礎模型得分為0.0。這些發現證明小模型能實現與大模型相媲美的推理能力,大幅降低訓練與推論成本,從而推動先進AI研究的普惠化發展。
English
Challenging the prevailing consensus that small models inherently lack robust
reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense
model developed via our Spectrum-to-Signal Principle (SSP). This challenges the
prevailing approach of scaling model parameters to enhance capabilities, as
seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework
first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a
broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL)
to amplify the correct signal. With a total training cost of only $7,800,
VibeThinker-1.5B demonstrates superior reasoning capabilities compared to
closed-source models like Magistral Medium and Claude Opus 4, and performs on
par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses
the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8),
AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial
improvement over its base model (6.7, 4.3, and 0.6, respectively). On
LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its
base model's 0.0. These findings demonstrate that small models can achieve
reasoning capabilities comparable to large models, drastically reducing
training and inference costs and thereby democratizing advanced AI research.