CWM:一款开源权重的大型语言模型,专为结合世界模型进行代码生成研究而设计
CWM: An Open-Weights LLM for Research on Code Generation with World Models
September 30, 2025
作者: FAIR CodeGen team, Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, Jannik Kossen, Felix Kreuk, Emily McMilin, Michel Meyer, Yuxiang Wei, David Zhang, Kunhao Zheng, Jordi Armengol-Estapé, Pedram Bashiri, Maximilian Beck, Pierre Chambon, Abhishek Charnalia, Chris Cummins, Juliette Decugis, Zacharias V. Fisches, François Fleuret, Fabian Gloeckle, Alex Gu, Michael Hassid, Daniel Haziza, Badr Youbi Idrissi, Christian Keller, Rahul Kindi, Hugh Leather, Gallil Maimon, Aram Markosyan, Francisco Massa, Pierre-Emmanuel Mazaré, Vegard Mella, Naila Murray, Keyur Muzumdar, Peter O'Hearn, Matteo Pagliardini, Dmitrii Pedchenko, Tal Remez, Volker Seeker, Marco Selvi, Oren Sultan, Sida Wang, Luca Wehrstedt, Ori Yoran, Lingming Zhang, Taco Cohen, Yossi Adi, Gabriel Synnaeve
cs.AI
摘要
我们发布了Code World Model(CWM),这是一个拥有320亿参数的开源权重大语言模型,旨在推动基于世界模型的代码生成研究。为了超越仅通过静态代码训练所能达到的代码理解水平,我们对CWM进行了中期训练,使用了大量来自Python解释器和自主Docker环境的观察-行动轨迹数据,并在可验证编码、数学以及多轮软件工程环境中进行了广泛的多任务推理强化学习。通过CWM,我们为研究人员提供了一个强大的测试平台,以探索世界模型在计算环境中通过推理和规划改进代码生成所带来的机遇。我们展示了世界模型如何助力自主编码、实现Python代码执行的逐步模拟,并初步揭示了推理如何从后者中获益。CWM是一个密集的、仅解码器架构的大语言模型,训练时上下文长度可达131k个token。即便不考虑其世界建模能力,CWM在通用编码和数学任务上也表现出色:在SWE-bench Verified(含测试时扩展)上达到65.8%的pass@1分数,在LiveCodeBench上为68.6%,在Math-500上高达96.6%,在AIME 2024上则为76.0%。为了支持代码世界建模的进一步研究,我们发布了中期训练、监督微调(SFT)和强化学习(RL)后的模型检查点。
English
We release Code World Model (CWM), a 32-billion-parameter open-weights LLM,
to advance research on code generation with world models. To improve code
understanding beyond what can be learned from training on static code alone, we
mid-train CWM on a large amount of observation-action trajectories from Python
interpreter and agentic Docker environments, and perform extensive multi-task
reasoning RL in verifiable coding, math, and multi-turn software engineering
environments. With CWM, we provide a strong testbed for researchers to explore
the opportunities world modeling affords for improving code generation with
reasoning and planning in computational environments. We present first steps of
how world models can benefit agentic coding, enable step-by-step simulation of
Python code execution, and show early results of how reasoning can benefit from
the latter. CWM is a dense, decoder-only LLM trained with a context size of up
to 131k tokens. Independent of its world modeling capabilities, CWM offers
strong performance on general coding and math tasks: it reaches pass@1 scores
of 65.8% on SWE-bench Verified (with test-time scaling), 68.6% on
LiveCodeBench, 96.6% on Math-500, and 76.0% on AIME 2024. To support further
research on code world modeling, we release model checkpoints after
mid-training, SFT, and RL.