ChatPaper.aiChatPaper

小模型,大逻辑:多样性驱动优化激发VibeThinker-1.5B的大模型级推理能力

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

November 9, 2025
作者: Sen Xu, Yi Zhou, Wei Wang, Jixin Min, Zhibin Yin, Yingwei Dai, Shixi Liu, Lianyu Pang, Yirong Chen, Junlin Zhang
cs.AI

摘要

挑战当前普遍认为小模型天生缺乏强大推理能力的共识,本报告推出VibeThinker-1.5B——一个基于"频谱-信号原则"(SSP)开发的15亿参数稠密模型。该模型对通过扩大参数规模提升能力的主流方法(如DeepSeek R1的6710亿参数、Kimi k2的万亿级参数)提出了质疑。SSP框架首先采用两阶段多样性探索蒸馏法进行监督微调,生成广谱解决方案,再通过最大熵引导策略优化进行强化学习以放大正确信号。在总训练成本仅7800美元的情况下,VibeThinker-1.5B展现出优于Magistral Medium、Claude Opus 4等闭源模型的推理能力,并与GPT OSS-20B Medium等开源模型性能相当。值得注意的是,它在三项数学基准测试中超越了参数规模400倍以上的DeepSeek R1:AIME24(80.3分 vs 79.8分)、AIME25(74.4分 vs 70.0分)和HMMT25(50.4分 vs 41.7分)。相较于其基础模型(三项得分分别为6.7、4.3和0.6),这是质的飞跃。在LiveCodeBench V6测试中,它以51.1分的成绩超越Magistral Medium的50.3分及其基础模型的0分。这些发现证明小模型同样能实现与大模型相媲美的推理能力,大幅降低训练与推理成本,从而推动先进AI研究的普惠化发展。
English
Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.
PDF12511December 2, 2025